Transformer
Pronunciation: /trænsˈfɔːmə/
The neural network architecture that revolutionised AI by enabling models to process entire sequences at once.
Plain English Explanation
The Transformer is a type of neural network architecture introduced in 2017 that processes data using a mechanism called "attention." Instead of reading text word by word in order (like earlier models), Transformers can look at all words in a sentence simultaneously and determine which ones are most relevant to each other.\n\nThis parallel processing makes Transformers faster to train and more effective at capturing long-range relationships in text, which is why virtually all modern language models are built on this architecture.
Analogy
Imagine reading a sentence where you can instantly see connections between any two words, no matter how far apart they are — like having X-ray vision for language structure. That is what attention allows.
How is it used?
Transformers are the foundation of GPT, BERT, Claude, Gemini, and virtually every major language model. They are also used in image generation (DALL-E), protein folding (AlphaFold), and speech recognition.
Real-world Example
When Google Translate produces a natural-sounding translation, or when a chatbot maintains context across a long conversation, Transformer architecture is doing the heavy lifting behind the scenes.
Common Misconceptions
Transformers are not limited to language — the name refers to the architecture, not transforming one thing into another in a general sense.
History
Introduced in the paper "Attention Is All You Need" (Vaswani et al., 2017) by Google researchers. It replaced recurrent neural networks as the dominant approach for sequence processing.
Related Terms
See Also
Also known as: Attention mechanism