AI Basics Machine Learning Large Language Models Prompt Engineering Mathematics Beginner

Inference

The phase when a trained model is actually used — taking new input and producing a prediction or response.

Plain English Explanation

Inference is the phase when a trained model is actually used — when it takes new input and produces a prediction or response. Training is learning; inference is performing.

Optimising inference speed and cost is a major focus for production AI systems.

Analogy

Inference is the moment a musician plays a piece in concert after years of practice. The learning happened in the rehearsal room; the performance is inference.

How is it used?

Every time you send a message to ChatGPT, every photo your phone tags automatically, and every search result ranked by relevance — that is inference happening in real time.

Real-world Example

A hospital runs inference on a trained model to score patient risk in seconds; the model was trained once, but inference runs continuously.

Common Misconceptions

Inference is not learning — the model's weights typically stay fixed unless you deliberately retrain or fine-tune.