What is a multi-head self-attention?
An extension of self-attention that applies the self-attention mechanism multiple times for each position in the input sequence.
multi-head self-attention explained in plain English
An extension of self-attention that applies the self-attention mechanism multiple times for each position in the input sequence. Transformers introduced multi-head self-attention.
Example
Practitioners refer to multi-head self-attention when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- agent orchestration
The centralized management and routing of tasks across multiple sub-agents or LLM calls.
- AI slop
Output from a generative AI system that favors quantity over quality.
- Attention
A mechanism that lets a model focus on the most relevant parts of its input when producing an output, weighting what matters most in context.
- auto-regressive model
A model that infers a prediction based on its own previous predictions.
- autoencoder
A system that learns to extract the most important information from the input.
- automatic evaluation
Using software to judge the quality of a model's output.
- autorater evaluation
A hybrid mechanism for judging the quality of a generative AI model's output that combines human evaluation with automatic evaluation.
- average precision at k
A metric for summarizing a model's performance on a single prompt that generates ranked results, such as a numbered list of book recommendations.
- bag of words
A representation of the words in a phrase or passage, irrespective of order.
- BERT
A model architecture for text representation.