label leakage
A model design flaw in which a feature is a proxy for the label.
Plain English Explanation
A model design flaw in which a feature is a proxy for the label. For example, consider a binary classification model that predicts whether or not a prospective customer will purchase a particular product. Suppose that one of the features for the model is a Boolean named`SpokeToCustomerAgent`. Further suppose that a customer agent is only assigned after the prospective customer has actually purchased the product. During training, the model will quickly learn the association between`SpokeToCustomerAgent` and the label. See Monitoring pipelines in Machine Learning Crash Course for more information.
How is it used?
Practitioners refer to label leakage when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.