AIExplainer
Deep Learning Advanced 1 min read

What is a gradient accumulation?

A backpropagation technique that updates the parameters only once per epoch rather than once per iteration.

A backpropagation technique that updates the parameters only once per epoch rather than once per iteration. After processing each mini-batch, gradient accumulation simply updates a running total of gradients. Then, after processing the last mini-batch in the epoch, the system finally updates the parameters based on the total of all gradient changes. Gradient accumulation is useful when the batch size is very large compared to the amount of available memory for training. When memory is an issue, the natural tendency is to reduce batch size. However, reducing the batch size in normal backpropagation increases the number of parameter updates. Gradient accumulation enables the model to avoid memory issues but still train efficiently.

Practitioners refer to gradient accumulation when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.