What is a gini impurity?
A metric similar to entropy.
gini impurity explained in plain English
A metric similar to entropy. Splitters use values derived from either gini impurity or entropy to compose conditions for classification decision trees. Information gain is derived from entropy. No universally accepted equivalent term for the metric derived from gini impurity exists; however, this unnamed metric is just as important as information gain. Gini impurity is also called gini index, or simply gini.
Gini impurity is the probability of misclassifying a new piece of data taken from the same distribution. The gini impurity of a set with two possible values "0" and "1" (for example, the labels in a binary classification problem) is calculated from the following formula: I = 1 - (p2 + q2) = 1 - (p2 + (1-p)2) where: - I is the gini impurity. - p is the fraction of "1" examples. - q is the fraction of "0" examples. Note that q = 1-p For example, consider the following dataset: - 100 labels (0.25 of the dataset) contain the value "1" - 300 labels (0.75 of the dataset) contain the value "0" Therefore, the gini impurity is: - p = 0.25 - q = 0.75 - I = 1 - (0.252 + 0.752) = 0.375 Consequently, a random label from the same dataset would have a 37.5% chance of being misclassified, and a 62.5% chance of being properly classified. A perfectly balanced label (for example, 200 "0"s and 200 "1"s) would have a gini impurity of 0.5. A highly imbalanced label would have a gini impurity close to 0.0. ---
Example
Practitioners refer to gini impurity when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- AUC
A number between 0.
- Backpropagation
The process that tells a neural network which internal settings caused an error and how to adjust them, working backwards through layers.
- Bayesian neural network
A probabilistic neural network that accounts for uncertainty in weights and outputs.
- Bayesian optimization
A probabilistic regression model technique for optimizing computationally expensive objective functions by instead optimizing a surrogate that quantifies the uncertainty using a Bayesian learning technique.
- classification threshold
In a binary classification, a number between 0 and 1 that converts the raw output of a logistic regression model into a prediction of either the positive class or the negative class.
- configuration
The process of assigning the initial property values used to train a model, including: hyperparameters such as: - learning rate - iterations - optimizer - loss function In machine learning projects, c
- confusion matrix
An NxN table that summarizes the number of correct and incorrect predictions that a classification model made.
- cross-entropy
A generalization of Log Loss to multi-class classification problems.
- discriminative model
A model that predicts labels from a set of one or more features.
- embedding layer
A special hidden layer that trains on a high-dimensional categorical feature to gradually learn a lower dimension embedding vector.