Large Language Models Intermediate 1 min read

What is a latency?

The time it takes for a model to process input and generate a response.

latency explained in plain English

The time it takes for a model to process input and generate a response. A high latency response takes takes longer to generate than a low latency response. Factors that influence latency of large language models include: - Input and output token lengths - Model complexity - The infrastructure the model runs on Optimizing for latency is crucial for creating responsive and user-friendly applications.

Example

Practitioners refer to latency when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.

latency explained in plain English

Example

People also read