Machine Learning Ethics & Safety Beginner

counterfactual fairness

Plain English Explanation

A fairness metric that checks whether a classification model produces the same result for one individual as it does for another individual who is identical to the first, except with respect to one or more sensitive attributes. Evaluating a classification model for counterfactual fairness is one method for surfacing potential sources of bias in a model. See either of the following for more information: - Fairness: Counterfactual fairness in Machine Learning Crash Course. - When Worlds Collide: Integrating Different Counterfactual Assumptions in Fairness

How is it used?

Practitioners refer to counterfactual fairness when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.