What is a preprocessing?
Processing data before it's used to train a model.
preprocessing explained in plain English
Processing data before it's used to train a model. Preprocessing could be as simple as removing words from an English text corpus that don't occur in the English dictionary, or could be as complex as re-expressing data points in a way that eliminates as many attributes that are correlated with sensitive attributes as possible. Preprocessing can help satisfy fairness constraints.
Example
Practitioners refer to preprocessing when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- reporting bias
The fact that the frequency with which people write about actions, outcomes, or properties is not a reflection of their real-world frequencies or the degree to which a property is characteristic of a class of individuals.
- automatic evaluation
Using software to judge the quality of a model's output.
- bag of words
A representation of the words in a phrase or passage, irrespective of order.
- BERT
A model architecture for text representation.
- bigram
An N-gram in which N=2.
- BLEU
A metric between 0.
- BLEURT
A metric for evaluating machine translations from one language to another, particularly to and from English.
- Character N-gram F-score
A metric to evaluate machine translation models.
- Confabulation
When an AI produces a confident, fluent answer that sounds true but is factually wrong — generating plausible language without a reliable link to reality.
- constituency parsing
Dividing a sentence into smaller grammatical structures ("constituents").