AIExplainer

What is a BLEURT?

A metric for evaluating machine translations from one language to another, particularly to and from English.

Stands for: Bilingual Evaluation Understudy from Transformers

A metric for evaluating machine translations from one language to another, particularly to and from English. For translations to and from English, BLEURT aligns more closely to human ratings than BLEU. Unlike BLEU, BLEURT emphasizes semantic (meaning) similarities and can accommodate paraphrasing. BLEURT relies on a pre-trained large language model (BERT to be exact) that is then fine-tuned on text from human translators. The original paper on this metric is BLEURT: Learning Robust Metrics for Text Generation.

Practitioners refer to bleurt when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.