What is an exploding gradient problem?
The tendency for gradients in deep neural networks (especially recurrent neural networks) to become surprisingly steep (high).
exploding gradient problem explained in plain English
The tendency for gradients in deep neural networks (especially recurrent neural networks) to become surprisingly steep (high). Steep gradients often cause very large updates to the weights of each node in a deep neural network. Models suffering from the exploding gradient problem become difficult or impossible to train. Gradient clipping can mitigate this problem. Compare to vanishing gradient problem.
Example
Practitioners refer to exploding gradient problem when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- accelerator chip
A category of specialized hardware components designed to perform key computations needed for deep learning algorithms.
- activation function
A function that enables neural networks to learn nonlinear (complex) relationships between features and the label.
- AdaGrad
A sophisticated gradient descent algorithm that rescales the gradients of each parameter, effectively giving each parameter an independent learning rate.
- Attention
A mechanism that lets a model focus on the most relevant parts of its input when producing an output, weighting what matters most in context.
- auto-regressive model
A model that infers a prediction based on its own previous predictions.
- autoencoder
A system that learns to extract the most important information from the input.
- auxiliary loss
A loss function—used in conjunction with a neural network model's main loss function—that helps accelerate training during the early iterations when weights are randomly initialized.
- Backpropagation
The process that tells a neural network which internal settings caused an error and how to adjust them, working backwards through layers.
- batch
The set of examples used in one training iteration.
- batch normalization
Normalizing the input or output of the activation functions in a hidden layer.