What is a ROUGE-N?
A set of metrics within the ROUGE family that compares the shared N-grams of a certain size in the reference text and generated text.
ROUGE-N explained in plain English
A set of metrics within the ROUGE family that compares the shared N-grams of a certain size in the reference text and generated text. For example: - ROUGE-1 measures the number of shared tokens in the reference text and generated text. - ROUGE-2 measures the number of shared bigrams (2-grams) in the reference text and generated text. - ROUGE-3 measures the number of shared trigrams (3-grams) in the reference text and generated text. You can use the following formulas to calculate ROUGE-N recall and ROUGE-N precision for any member of the ROUGE-N family:
You can then use F1 to roll up ROUGE-N recall and ROUGE-N precision into a single metric:
Suppose you decide to use ROUGE-2 to measure the effectiveness of an ML model's translation compared to a human translator's. Text | Bigrams | --- | --- | I want to understand a wide variety of things. | I want, want to, to understand, understand a, a wide, wide variety, variety of, of things | I want to learn plenty of things. | I want, want to, to learn, learn plenty, plenty of, of things | Therefore: - The number of matching 2-grams is 3 (I want, want to, and of things). - The number of 2-grams in the reference text is 8. - The number of 2-grams in the generated text is 6. Consequently:
Example
Practitioners refer to rouge-n when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- BERT
A model architecture for text representation.
- Character N-gram F-score
A metric to evaluate machine translation models.
- Embedding
A numerical representation of text, images, or other data that captures semantic meaning.
- encoder
In general, any ML system that converts from a raw, sparse, or external representation into a more processed, denser, or more internal representation.
- language model
A model that estimates the probability of a token or sequence of tokens occurring in a longer sequence of tokens.
- rotational invariance
In an image classification problem, an algorithm's ability to successfully classify images even when the orientation of the image changes.
- ROUGE
A family of metrics that evaluate automatic summarization and machine translation models.
- ROUGE-L
A member of the ROUGE family focused on the length of the longest common subsequence in the reference text and generated text.
- ROUGE-S
A forgiving form of ROUGE-N that enables skip-gram matching.
- sentiment analysis
Using statistical or machine learning algorithms to determine a group's overall attitude—positive or negative—toward a service, product, organization, or topic.