What is a selection bias?
Errors in conclusions drawn from sampled data due to a selection process that generates systematic differences between samples observed in the data and those not observed.
selection bias explained in plain English
Errors in conclusions drawn from sampled data due to a selection process that generates systematic differences between samples observed in the data and those not observed. The following forms of selection bias exist: - coverage bias: The population represented in the dataset doesn't match the population that the machine learning model is making predictions about. - sampling bias: Data is not collected randomly from the target group. - non-response bias (also called participation bias): Users from certain groups opt-out of surveys at different rates than users from other groups. For example, suppose you are creating a machine learning model that predicts people's enjoyment of a movie. To collect training data, you hand out a survey to everyone in the front row of a theater showing the movie. Offhand, this may sound like a reasonable way to gather a dataset; however, this form of data collection may introduce the following forms of selection bias: - coverage bias: By sampling from a population who chose to see the movie, your model's predictions may not generalize to people who did not already express that level of interest in the movie. - sampling bias: Rather than randomly sampling from the intended population (all the people at the movie), you sampled only the people in the front row. It is possible that the people sitting in the front row were more interested in the movie than those in other rows. - non-response bias: In general, people with strong opinions tend to respond to optional surveys more frequently than people with mild opinions. Since the movie survey is optional, the responses are more likely to form a bimodal distribution than a normal (bell-shaped) distribution.
Example
Practitioners refer to selection bias when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- attribute
Synonym for feature.
- automation bias
When a human decision maker favors recommendations made by an automated decision-making system over information made without automation, even when the automated decision-making system makes errors.
- bias
1.
- bias (math) or bias term
An intercept or offset from an origin.
- calibration layer
A post-prediction adjustment, typically to account for prediction bias.
- Confabulation
When an AI produces a confident, fluent answer that sounds true but is factually wrong — generating plausible language without a reliable link to reality.
- confirmation bias
The tendency to search for, interpret, favor, and recall information in a way that confirms one's pre-existing beliefs or hypotheses.
- counterfactual fairness
A fairness metric that checks whether a classification model produces the same result for one individual as it does for another individual who is identical to the first, except with respect to one or more sensitive attributes.
- coverage bias
See selection bias.
- demographic parity
A fairness metric that is satisfied if the results of a model's classification are not dependent on a given sensitive attribute.