What is a gradient boosting?
A training algorithm where weak models are trained to iteratively improve the quality (reduce the loss) of a strong model.
gradient boosting explained in plain English
A training algorithm where weak models are trained to iteratively improve the quality (reduce the loss) of a strong model. For example, a weak model could be a linear or small decision tree model. The strong model becomes the sum of all the previously trained weak models. In the simplest form of gradient boosting, at each iteration, a weak model is trained to predict the loss gradient of the strong model. Then, the strong model's output is updated by subtracting the predicted gradient, similar to gradient descent. where: - $F_{0}$ is the starting strong model. - $F_{i+1}$ is the next strong model. - $F_{i}$ is the current strong model. - $\xi$ is a value between 0.0 and 1.0 called shrinkage, which is analogous to the learning rate in gradient descent. - $f_{i}$ is the weak model trained to predict the loss gradient of $F_{i}$. Modern variations of gradient boosting also include the second derivative (Hessian) of the loss in their computation. Decision trees are commonly used as weak models in gradient boosting. See gradient boosted (decision) trees.
Example
Practitioners refer to gradient boosting when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- Backpropagation
The process that tells a neural network which internal settings caused an error and how to adjust them, working backwards through layers.
- Bayesian neural network
A probabilistic neural network that accounts for uncertainty in weights and outputs.
- embedding layer
A special hidden layer that trains on a high-dimensional categorical feature to gradually learn a lower dimension embedding vector.
- full softmax
Synonym for softmax.
- generative model
Practically speaking, a model that does either of the following: - Creates (generates) new examples from the training dataset.
- Gradient Descent
The method by which a model gradually improves by making small adjustments after each mistake, moving toward better performance.
- input layer
The layer of a neural network that holds the feature vector.
- logistic regression
A type of regression model that predicts a probability.
- minimax loss
A loss function for generative adversarial networks, based on the cross-entropy between the distribution of generated data and real data.
- sigmoid function
A mathematical function that "squishes" an input value into a constrained range, typically 0 to 1 or -1 to +1.