What is a BLEURT?
A metric for evaluating machine translations from one language to another, particularly to and from English.
Stands for: Bilingual Evaluation Understudy from Transformers
BLEURT explained in plain English
A metric for evaluating machine translations from one language to another, particularly to and from English. For translations to and from English, BLEURT aligns more closely to human ratings than BLEU. Unlike BLEU, BLEURT emphasizes semantic (meaning) similarities and can accommodate paraphrasing. BLEURT relies on a pre-trained large language model (BERT to be exact) that is then fine-tuned on text from human translators. The original paper on this metric is BLEURT: Learning Robust Metrics for Text Generation.
Example
Practitioners refer to bleurt when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- BERT
A model architecture for text representation.
- BLEU
A metric between 0.
- ROUGE
A family of metrics that evaluate automatic summarization and machine translation models.
- automatic evaluation
Using software to judge the quality of a model's output.
- bag of words
A representation of the words in a phrase or passage, irrespective of order.
- bigram
An N-gram in which N=2.
- Character N-gram F-score
A metric to evaluate machine translation models.
- constituency parsing
Dividing a sentence into smaller grammatical structures ("constituents").
- crash blossom
A sentence or phrase with an ambiguous meaning.
- decoder
In general, any ML system that converts from a processed, dense, or internal representation to a more raw, sparse, or external representation.