Machine Learning Natural Language Processing Large Language Models Acronyms Intermediate

BERT

What does it stand for? Bidirectional Encoder Representations from Transformers

A model architecture for text representation.

Plain English Explanation

A model architecture for text representation. A trained BERT model can act as part of a larger model for text classification or other ML tasks. BERT has the following characteristics: - Uses the Transformer architecture, and therefore relies on self-attention. - Uses the encoder part of the Transformer. The encoder's job is to produce good text representations, rather than to perform a specific task like classification. - Is bidirectional. - Uses masking for unsupervised training. BERT's variants include: - ALBERT, which is an acronym for A Light BERT. - LaBSE. See Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing for an overview of BERT.

How is it used?

Practitioners refer to bert when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.