AIExplainer
Large Language Models Intermediate 1 min read

What is a multi-head self-attention?

An extension of self-attention that applies the self-attention mechanism multiple times for each position in the input sequence.

An extension of self-attention that applies the self-attention mechanism multiple times for each position in the input sequence. Transformers introduced multi-head self-attention.

Practitioners refer to multi-head self-attention when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.