AIExplainer

latency

The time it takes for a model to process input and generate a response.

The time it takes for a model to process input and generate a response. A high latency response takes takes longer to generate than a low latency response. Factors that influence latency of large language models include: - Input and output token lengths - Model complexity - The infrastructure the model runs on Optimizing for latency is crucial for creating responsive and user-friendly applications.

Practitioners refer to latency when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.