AIExplainer

What is a side-by-side evaluation?

Comparing the quality of two models by judging their responses to the same prompt.

Comparing the quality of two models by judging their responses to the same prompt. For example, suppose the following prompt is given to two different models: Create an image of a cute dog juggling three balls. In a side-by-side evaluation, a rater would pick which image was "better" (More accurate? More beautiful? Cuter?).

Practitioners refer to side-by-side evaluation when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.