Large Language Models Intermediate
multi-head self-attention
An extension of self-attention that applies the self-attention mechanism multiple times for each position in the input sequence.
Plain English Explanation
An extension of self-attention that applies the self-attention mechanism multiple times for each position in the input sequence. Transformers introduced multi-head self-attention.
How is it used?
Practitioners refer to multi-head self-attention when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.