What is a Momentum?
A sophisticated gradient descent algorithm in which a learning step depends not only on the derivative in the current step, but also on the derivatives of the step(s) that immediately preceded it.
Momentum explained in plain English
A sophisticated gradient descent algorithm in which a learning step depends not only on the derivative in the current step, but also on the derivatives of the step(s) that immediately preceded it. Momentum involves computing an exponentially weighted moving average of the gradients over time, analogous to momentum in physics. Momentum sometimes prevents learning from getting stuck in local minima.
Example
Practitioners refer to momentum when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- auxiliary loss
A loss function—used in conjunction with a neural network model's main loss function—that helps accelerate training during the early iterations when weights are randomly initialized.
- Backpropagation
The process that tells a neural network which internal settings caused an error and how to adjust them, working backwards through layers.
- Bayesian neural network
A probabilistic neural network that accounts for uncertainty in weights and outputs.
- convex function
A function in which the region above the graph of the function is a convex set.
- convolution
In mathematics, casually speaking, a mixture of two functions.
- convolutional filter
One of the two actors in a convolutional operation.
- convolutional layer
A layer of a deep neural network in which a convolutional filter passes along an input matrix.
- convolutional operation
The following two-step mathematical operation: 1.
- embedding layer
A special hidden layer that trains on a high-dimensional categorical feature to gradually learn a lower dimension embedding vector.
- embedding vector
Broadly speaking, an array of floating-point numbers taken from any hidden layer that describe the inputs to that hidden layer.