Large Language Models Intermediate

multi-head self-attention

An extension of self-attention that applies the self-attention mechanism multiple times for each position in the input sequence.

Plain English Explanation

An extension of self-attention that applies the self-attention mechanism multiple times for each position in the input sequence. Transformers introduced multi-head self-attention.

How is it used?

Practitioners refer to multi-head self-attention when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.