class-imbalanced dataset
A dataset for a classification in which the total number of labels of each class differs significantly.
Plain English Explanation
A dataset for a classification in which the total number of labels of each class differs significantly. For example, consider a binary classification dataset whose two labels are divided as follows: - 1,000,000 negative labels - 10 positive labels The ratio of negative to positive labels is 100,000 to 1, so this is a class-imbalanced dataset. In contrast, the following dataset is class-balanced because the ratio of negative labels to positive labels is relatively close to 1: - 517 negative labels - 483 positive labels Multi-class datasets can also be class-imbalanced. For example, the following multi-class classification dataset is also class-imbalanced because one label has far more examples than the other two: - 1,000,000 labels with class "green" - 200 labels with class "purple" - 350 labels with class "orange" Training class-imbalanced datasets can present special challenges. See Imbalanced datasets in Machine Learning Crash Course for details. See also entropy, majority class, and minority class.
How is it used?
Practitioners refer to class-imbalanced dataset when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.