Data Augmentation
Creating additional training examples by slightly modifying existing ones — flipping, cropping, or rephrasing — to increase variety without new data collection.
Plain English Explanation
Data augmentation creates additional training examples by slightly modifying existing ones — flipping, cropping, or rephrasing — so the system sees more variety without collecting entirely new data.
It helps models generalise when real-world data is limited or expensive.
Analogy
Data augmentation is like a musician practising a piece in different keys and tempos. The song is the same, but the varied conditions build flexibility and resilience.
How is it used?
Self-driving car systems train on rotated and shifted images of roads. Language models benefit from paraphrased sentences. Medical AI uses augmented scans when real patient data is limited.
Real-world Example
A small photo dataset of defects on a factory line can be augmented with rotations and brightness changes to train a more robust inspector model.
Common Misconceptions
Augmentation must stay realistic — extreme transformations can introduce noise that hurts rather than helps.