What is an entropy?
In information theory, a description of how unpredictable a probability distribution is.
entropy explained in plain English
In information theory, a description of how unpredictable a probability distribution is. Alternatively, entropy is also defined as how much information each example contains. A distribution has the highest possible entropy when all values of a random variable are equally likely. The entropy of a set with two possible values "0" and "1" (for example, the labels in a binary classification problem) has the following formula: H = -p log p - q log q = -p log p - (1-p) * log (1-p) where: - H is the entropy. - p is the fraction of "1" examples. - q is the fraction of "0" examples. Note that q = (1 - p) - log is generally log2. In this case, the entropy unit is a bit. For example, suppose the following: - 100 examples contain the value "1" - 300 examples contain the value "0" Therefore, the entropy value is: - p = 0.25 - q = 0.75 - H = (-0.25)log2(0.25) - (0.75)log2(0.75) = 0.81 bits per example A set that is perfectly balanced (for example, 200 "0"s and 200 "1"s) would have an entropy of 1.0 bit per example. As a set becomes more imbalanced, its entropy moves towards 0.0. In decision trees, entropy helps formulate information gain to help the splitter select the conditions during the growth of a classification decision tree. Compare entropy with: - gini impurity - cross-entropy loss function Entropy is often called Shannon's entropy. See Exact splitter for binary classification with numerical features in the Decision Forests course for more information.
Example
Practitioners refer to entropy when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- AUC
A number between 0.
- Backpropagation
The process that tells a neural network which internal settings caused an error and how to adjust them, working backwards through layers.
- Bayesian neural network
A probabilistic neural network that accounts for uncertainty in weights and outputs.
- Bayesian optimization
A probabilistic regression model technique for optimizing computationally expensive objective functions by instead optimizing a surrogate that quantifies the uncertainty using a Bayesian learning technique.
- classification threshold
In a binary classification, a number between 0 and 1 that converts the raw output of a logistic regression model into a prediction of either the positive class or the negative class.
- configuration
The process of assigning the initial property values used to train a model, including: hyperparameters such as: - learning rate - iterations - optimizer - loss function In machine learning projects, c
- confusion matrix
An NxN table that summarizes the number of correct and incorrect predictions that a classification model made.
- cross-entropy
A generalization of Log Loss to multi-class classification problems.
- discriminative model
A model that predicts labels from a set of one or more features.
- embedding layer
A special hidden layer that trains on a high-dimensional categorical feature to gradually learn a lower dimension embedding vector.