What is a BERT?
A model architecture for text representation.
Stands for: Bidirectional Encoder Representations from Transformers
BERT explained in plain English
A model architecture for text representation. A trained BERT model can act as part of a larger model for text classification or other ML tasks. BERT has the following characteristics: - Uses the Transformer architecture, and therefore relies on self-attention. - Uses the encoder part of the Transformer. The encoder's job is to produce good text representations, rather than to perform a specific task like classification. - Is bidirectional. - Uses masking for unsupervised training. BERT's variants include: - ALBERT, which is an acronym for A Light BERT. - LaBSE. See Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing for an overview of BERT.
Example
Practitioners refer to bert when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- ROUGE
A family of metrics that evaluate automatic summarization and machine translation models.
- BLEU
A metric between 0.
- BLEURT
A metric for evaluating machine translations from one language to another, particularly to and from English.
- Character N-gram F-score
A metric to evaluate machine translation models.
- Embedding
A numerical representation of text, images, or other data that captures semantic meaning.
- encoder
In general, any ML system that converts from a raw, sparse, or external representation into a more processed, denser, or more internal representation.
- language model
A model that estimates the probability of a token or sequence of tokens occurring in a longer sequence of tokens.
- rotational invariance
In an image classification problem, an algorithm's ability to successfully classify images even when the orientation of the image changes.
- ROUGE-L
A member of the ROUGE family focused on the length of the longest common subsequence in the reference text and generated text.
- ROUGE-N
A set of metrics within the ROUGE family that compares the shared N-grams of a certain size in the reference text and generated text.