What is a Data Augmentation?
Creating additional training examples by slightly modifying existing ones — flipping, cropping, or rephrasing — to increase variety without new data collection.
Data Augmentation explained in plain English
Data augmentation creates additional training examples by slightly modifying existing ones — flipping, cropping, or rephrasing — so the system sees more variety without collecting entirely new data.
It helps models generalise when real-world data is limited or expensive.
Analogy
Data augmentation is like a musician practising a piece in different keys and tempos. The song is the same, but the varied conditions build flexibility and resilience.
Example
A small photo dataset of defects on a factory line can be augmented with rotations and brightness changes to train a more robust inspector model.
How is Data Augmentation used?
Self-driving car systems train on rotated and shifted images of roads. Language models benefit from paraphrased sentences. Medical AI uses augmented scans when real patient data is limited.
Common misconceptions about Data Augmentation
Augmentation must stay realistic — extreme transformations can introduce noise that hurts rather than helps.
People also read
- A/B testing
A statistical way of comparing two (or more) techniques—the A and the B.
- ablation
A technique for evaluating the importance of a feature or component by temporarily removing it from a model.
- accuracy
The number of correct classification predictions divided by the total number of predictions.
- activation function
A function that enables neural networks to learn nonlinear (complex) relationships between features and the label.
- active learning
A training approach in which the algorithm chooses some of the data it learns from.
- adaptation
Synonym for tuning or fine-tuning.
- agglomerative clustering
See hierarchical clustering.
- anomaly detection
The process of identifying outliers.
- area under the PR curve
See PR AUC (Area under the PR Curve).
- area under the ROC curve
See AUC (Area under the ROC curve).