Inference
The phase when a trained model is actually used — taking new input and producing a prediction or response.
Plain English Explanation
Inference is the phase when a trained model is actually used — when it takes new input and produces a prediction or response. Training is learning; inference is performing.
Optimising inference speed and cost is a major focus for production AI systems.
Analogy
Inference is the moment a musician plays a piece in concert after years of practice. The learning happened in the rehearsal room; the performance is inference.
How is it used?
Every time you send a message to ChatGPT, every photo your phone tags automatically, and every search result ranked by relevance — that is inference happening in real time.
Real-world Example
A hospital runs inference on a trained model to score patient risk in seconds; the model was trained once, but inference runs continuously.
Common Misconceptions
Inference is not learning — the model's weights typically stay fixed unless you deliberately retrain or fine-tune.