inter-rater agreement

A measurement of how often human raters agree when doing a task.

Plain English Explanation

A measurement of how often human raters agree when doing a task. If raters disagree, the task instructions may need to be improved. Also sometimes called inter-annotator agreement or inter-rater reliability. See also Cohen's kappa, which is one of the most popular inter-rater agreement measurements. See Categorical data: Common issues in Machine Learning Crash Course for more information.

How is it used?

Practitioners refer to inter-rater agreement when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.