AIExplainer

What is an evaluation?

The process of measuring a model's quality or comparing different models against each other.

The process of measuring a model's quality or comparing different models against each other. To evaluate a supervised machine learning model, you typically judge it against a validation set and a test set. Evaluating a LLM typically involves broader quality and safety assessments.

Practitioners refer to evaluation when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.