What is a masked language model?
A language model that predicts the probability of candidate tokens to fill in blanks in a sequence.
masked language model explained in plain English
A language model that predicts the probability of candidate tokens to fill in blanks in a sequence. For example, a masked language model can calculate probabilities for candidate word(s) to replace the underline in the following sentence: The ____ in the hat came back. The literature typically uses the string "MASK" instead of an underline. For example: The "MASK" in the hat came back. Most modern masked language models are bidirectional.
Example
Practitioners refer to masked language model when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- bag of words
A representation of the words in a phrase or passage, irrespective of order.
- bidirectional language model
A language model that determines the probability that a given token is present at a given location in an excerpt of text based on the preceding and following text.
- cross-entropy
A generalization of Log Loss to multi-class classification problems.
- dimension reduction
Decreasing the number of dimensions used to represent a particular feature in a feature vector, typically by converting to an embedding vector.
- dimensions
Overloaded term having any of the following definitions: The number of levels of coordinates in a Tensor.
- distillation
The process of reducing the size of one model (known as the teacher) into a smaller model (known as the student) that emulates the original model's predictions as faithfully as possible.
- embedding layer
A special hidden layer that trains on a high-dimensional categorical feature to gradually learn a lower dimension embedding vector.
- embedding space
The d-dimensional vector space that features from a higher-dimensional vector space are mapped to.
- embedding vector
Broadly speaking, an array of floating-point numbers taken from any hidden layer that describe the inputs to that hidden layer.
- encoder
In general, any ML system that converts from a raw, sparse, or external representation into a more processed, denser, or more internal representation.