class-balanced dataset
A dataset containing categorical labels in which the number of instances of each category is approximately equal.
Plain English Explanation
A dataset containing categorical labels in which the number of instances of each category is approximately equal. For example, consider a botanical dataset whose binary label can be either native plant or nonnative plant: - A dataset with 515 native plants and 485 nonnative plants is a class-balanced dataset. - A dataset with 875 native plants and 125 nonnative plants is a class-imbalanced dataset. A formal dividing line between class-balanced datasets and class-imbalanced datasets doesn't exist. The distinction only becomes important when a model trained on a highly class-imbalanced dataset can't converge. See Datasets: imbalanced datasets in Machine Learning Crash Course for details.
How is it used?
Practitioners refer to class-balanced dataset when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.