AIExplainer

What is a BLEU?

A metric between 0.

Stands for: Bilingual Evaluation Understudy

A metric between 0.0 and 1.0 for evaluating machine translations, for example, from Spanish to Japanese. To calculate a score, BLEU typically compares an ML model's translation (generated text) to a human expert's translation (reference text). The degree to which N-grams in the generated text and reference text match determines the BLEU score. The original paper on this metric is BLEU: a Method for Automatic Evaluation of Machine Translation. See also BLEURT.

Practitioners refer to bleu when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.