latency

The time it takes for a model to process input and generate a response.

Plain English Explanation

The time it takes for a model to process input and generate a response. A high latency response takes takes longer to generate than a low latency response. Factors that influence latency of large language models include: - Input and output token lengths - Model complexity - The infrastructure the model runs on Optimizing for latency is crucial for creating responsive and user-friendly applications.

How is it used?

Practitioners refer to latency when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.