online inference

Generating predictions on demand.

Plain English Explanation

Generating predictions on demand. For example, suppose an app passes input to a model and issues a request for a prediction. A system using online inference responds to the request by running the model (and returning the prediction to the app). Contrast with offline inference. See Production ML systems: Static versus dynamic inference in Machine Learning Crash Course for more information.

How is it used?

Practitioners refer to online inference when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.