What is a reporting bias?
The fact that the frequency with which people write about actions, outcomes, or properties is not a reflection of their real-world frequencies or the degree to which a property is characteristic of a class of individuals.
reporting bias explained in plain English
The fact that the frequency with which people write about actions, outcomes, or properties is not a reflection of their real-world frequencies or the degree to which a property is characteristic of a class of individuals. Reporting bias can influence the composition of data that machine learning systems learn from. For example, in books, the word laughed is more prevalent than breathed. A machine learning model that estimates the relative frequency of laughing and breathing from a book corpus would probably determine that laughing is more common than breathing. See Fairness: Types of bias in Machine Learning Crash Course for more information.
Example
Practitioners refer to reporting bias when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- preprocessing
Processing data before it's used to train a model.
- automatic evaluation
Using software to judge the quality of a model's output.
- bag of words
A representation of the words in a phrase or passage, irrespective of order.
- BERT
A model architecture for text representation.
- bigram
An N-gram in which N=2.
- BLEU
A metric between 0.
- BLEURT
A metric for evaluating machine translations from one language to another, particularly to and from English.
- Character N-gram F-score
A metric to evaluate machine translation models.
- Confabulation
When an AI produces a confident, fluent answer that sounds true but is factually wrong — generating plausible language without a reliable link to reality.
- constituency parsing
Dividing a sentence into smaller grammatical structures ("constituents").