Deep Learning Large Language Models Intermediate 1 min read

What is an Attention?

A mechanism that lets a model focus on the most relevant parts of its input when producing an output, weighting what matters most in context.

Attention explained in plain English

Attention is a mechanism that lets a model focus on the most relevant parts of its input when producing an output. Instead of treating every word or pixel equally, it weighs what matters most in context.

It solved a major bottleneck in processing long sequences and enabled the Transformer architecture.

Analogy

Attention is like reading a sentence and instinctively emphasising the words that carry the meaning, while barely registering filler words. Your focus shifts depending on what the sentence is actually about.

Example

When translating "The bank by the river" versus "The bank approved the loan," an attention-based model focuses on different words to resolve ambiguity.

How is Attention used?

Attention is the foundation of Transformers — the architecture behind GPT, Claude, and most modern language models. It also helps translation systems align words across languages.

Common misconceptions about Attention

Attention is not human-like focus or consciousness — it is a mathematical weighting scheme over inputs.

Also known as

self-attention