AIExplainer
Machine Learning Mathematics Intermediate 2 min read

What is an entropy?

In information theory, a description of how unpredictable a probability distribution is.

In information theory, a description of how unpredictable a probability distribution is. Alternatively, entropy is also defined as how much information each example contains. A distribution has the highest possible entropy when all values of a random variable are equally likely. The entropy of a set with two possible values "0" and "1" (for example, the labels in a binary classification problem) has the following formula: H = -p log p - q log q = -p log p - (1-p) * log (1-p) where: - H is the entropy. - p is the fraction of "1" examples. - q is the fraction of "0" examples. Note that q = (1 - p) - log is generally log2. In this case, the entropy unit is a bit. For example, suppose the following: - 100 examples contain the value "1" - 300 examples contain the value "0" Therefore, the entropy value is: - p = 0.25 - q = 0.75 - H = (-0.25)log2(0.25) - (0.75)log2(0.75) = 0.81 bits per example A set that is perfectly balanced (for example, 200 "0"s and 200 "1"s) would have an entropy of 1.0 bit per example. As a set becomes more imbalanced, its entropy moves towards 0.0. In decision trees, entropy helps formulate information gain to help the splitter select the conditions during the growth of a classification decision tree. Compare entropy with: - gini impurity - cross-entropy loss function Entropy is often called Shannon's entropy. See Exact splitter for binary classification with numerical features in the Decision Forests course for more information.

Practitioners refer to entropy when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.