undersampling
Removing examples from the majority class in a class-imbalanced dataset in order to create a more balanced training set.
Plain English Explanation
Removing examples from the majority class in a class-imbalanced dataset in order to create a more balanced training set. For example, consider a dataset in which the ratio of the majority class to the minority class is 20:1. To overcome this class imbalance, you could create a training set consisting of all of the minority class examples but only a tenth of the majority class examples, which would create a training-set class ratio of 2:1. Thanks to undersampling, this more balanced training set might produce a better model. Alternatively, this more balanced training set might contain insufficient examples to train an effective model. Contrast with oversampling.
How is it used?
Practitioners refer to undersampling when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.