offline inference

The process of a model generating a batch of predictions and then caching (saving) those predictions.

Plain English Explanation

The process of a model generating a batch of predictions and then caching (saving) those predictions. Apps can then access the inferred prediction from the cache rather than rerunning the model. For example, consider a model that generates local weather forecasts (predictions) once every four hours. After each model run, the system caches all the local weather forecasts. Weather apps retrieve the forecasts from the cache. Offline inference is also called static inference. Contrast with online inference. See Production ML systems: Static versus dynamic inference in Machine Learning Crash Course for more information.

How is it used?

Practitioners refer to offline inference when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.