Machine Learning Natural Language Processing Large Language Models Intermediate

ROUGE-N

A set of metrics within the ROUGE family that compares the shared N-grams of a certain size in the reference text and generated text.

Plain English Explanation

A set of metrics within the ROUGE family that compares the shared N-grams of a certain size in the reference text and generated text. For example: - ROUGE-1 measures the number of shared tokens in the reference text and generated text. - ROUGE-2 measures the number of shared bigrams (2-grams) in the reference text and generated text. - ROUGE-3 measures the number of shared trigrams (3-grams) in the reference text and generated text. You can use the following formulas to calculate ROUGE-N recall and ROUGE-N precision for any member of the ROUGE-N family:

You can then use F1 to roll up ROUGE-N recall and ROUGE-N precision into a single metric:

Suppose you decide to use ROUGE-2 to measure the effectiveness of an ML model's translation compared to a human translator's. Text | Bigrams | --- | --- | I want to understand a wide variety of things. | I want, want to, to understand, understand a, a wide, wide variety, variety of, of things | I want to learn plenty of things. | I want, want to, to learn, learn plenty, plenty of, of things | Therefore: - The number of matching 2-grams is 3 (I want, want to, and of things). - The number of 2-grams in the reference text is 8. - The number of 2-grams in the generated text is 6. Consequently:

How is it used?

Practitioners refer to rouge-n when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.