AIExplainer

What is an Inference?

The phase when a trained model is actually used — taking new input and producing a prediction or response.

Inference is the phase when a trained model is actually used — when it takes new input and produces a prediction or response. Training is learning; inference is performing.

Optimising inference speed and cost is a major focus for production AI systems.

Inference is the moment a musician plays a piece in concert after years of practice. The learning happened in the rehearsal room; the performance is inference.

A hospital runs inference on a trained model to score patient risk in seconds; the model was trained once, but inference runs continuously.

Every time you send a message to ChatGPT, every photo your phone tags automatically, and every search result ranked by relevance — that is inference happening in real time.

Inference is not learning — the model's weights typically stay fixed unless you deliberately retrain or fine-tune.