What is a batch inference?
The process of inferring predictions on multiple unlabeled examples divided into smaller subsets ("batches").
batch inference explained in plain English
The process of inferring predictions on multiple unlabeled examples divided into smaller subsets ("batches"). Batch inference can take advantage of the parallelization features of accelerator chips. That is, multiple accelerators can simultaneously infer predictions on different batches of unlabeled examples, dramatically increasing the number of inferences per second. See Production ML systems: Static versus dynamic inference in Machine Learning Crash Course for more information.
Example
Practitioners refer to batch inference when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- accelerator chip
A category of specialized hardware components designed to perform key computations needed for deep learning algorithms.
- checkpoint
Data that captures the state of a model's parameters either during training or after training is completed.
- compute
(Noun) The computational resources used by a model or system, such as processing power, memory, and storage.
- perceptron
A system (either hardware or software) that takes in one or more input values, runs a function on the weighted sum of the inputs, and computes a single output value.
- pure function
A function whose outputs are based only on its inputs, and that has no side effects.
- quantization
Overloaded term that could be used in any of the following ways: Reducing the number of bits used to store a model's parameters.
- shard
A logical division of the training set or the model.
- batch size
The number of examples in a batch.
- mini-batch
A small, randomly selected subset of a batch processed in one iteration.
- mini-batch stochastic gradient descent
A gradient descent algorithm that uses mini-batches.