Deep Learning Advanced 1 min read

What is a gradient accumulation?

A backpropagation technique that updates the parameters only once per epoch rather than once per iteration.

gradient accumulation explained in plain English

A backpropagation technique that updates the parameters only once per epoch rather than once per iteration. After processing each mini-batch, gradient accumulation simply updates a running total of gradients. Then, after processing the last mini-batch in the epoch, the system finally updates the parameters based on the total of all gradient changes. Gradient accumulation is useful when the batch size is very large compared to the amount of available memory for training. When memory is an issue, the natural tendency is to reduce batch size. However, reducing the batch size in normal backpropagation increases the number of parameter updates. Gradient accumulation enables the model to avoid memory issues but still train efficiently.

Example

Practitioners refer to gradient accumulation when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.

gradient accumulation explained in plain English

Example

People also read