What is a staged training?
A tactic of training a model in a sequence of discrete stages.
staged training explained in plain English
A tactic of training a model in a sequence of discrete stages. The goal can be either to speed up the training process, or to achieve better model quality. An illustration of the progressive stacking approach is shown below: - Stage 1 contains 3 hidden layers, stage 2 contains 6 hidden layers, and stage 3 contains 12 hidden layers. - Stage 2 begins training with the weights learned in the 3 hidden layers of Stage 1. Stage 3 begins training with the weights learned in the 6 hidden layers of Stage 2. See also pipelining.
Example
Practitioners refer to staged training when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- accelerator chip
A category of specialized hardware components designed to perform key computations needed for deep learning algorithms.
- activation function
A function that enables neural networks to learn nonlinear (complex) relationships between features and the label.
- AdaGrad
A sophisticated gradient descent algorithm that rescales the gradients of each parameter, effectively giving each parameter an independent learning rate.
- Attention
A mechanism that lets a model focus on the most relevant parts of its input when producing an output, weighting what matters most in context.
- auto-regressive model
A model that infers a prediction based on its own previous predictions.
- autoencoder
A system that learns to extract the most important information from the input.
- auxiliary loss
A loss function—used in conjunction with a neural network model's main loss function—that helps accelerate training during the early iterations when weights are randomly initialized.
- Backpropagation
The process that tells a neural network which internal settings caused an error and how to adjust them, working backwards through layers.
- batch
The set of examples used in one training iteration.
- batch normalization
Normalizing the input or output of the activation functions in a hidden layer.