What is an oversampling?
Reusing the examples of a minority class in a class-imbalanced dataset in order to create a more balanced training set.
oversampling explained in plain English
Reusing the examples of a minority class in a class-imbalanced dataset in order to create a more balanced training set. For example, consider a binary classification problem in which the ratio of the majority class to the minority class is 5,000:1. If the dataset contains a million examples, then the dataset contains only about 200 examples of the minority class, which might be too few examples for effective training. To overcome this deficiency, you might oversample (reuse) those 200 examples multiple times, possibly yielding sufficient examples for useful training. You need to be careful about over overfitting when oversampling. Contrast with undersampling.
Example
Practitioners refer to oversampling when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- A/B testing
A statistical way of comparing two (or more) techniques—the A and the B.
- ablation
A technique for evaluating the importance of a feature or component by temporarily removing it from a model.
- accuracy
The number of correct classification predictions divided by the total number of predictions.
- activation function
A function that enables neural networks to learn nonlinear (complex) relationships between features and the label.
- active learning
A training approach in which the algorithm chooses some of the data it learns from.
- adaptation
Synonym for tuning or fine-tuning.
- agglomerative clustering
See hierarchical clustering.
- anomaly detection
The process of identifying outliers.
- area under the PR curve
See PR AUC (Area under the PR Curve).
- area under the ROC curve
See AUC (Area under the ROC curve).