What is a mixture of experts?
A scheme to increase neural network efficiency by using only a subset of its parameters (known as an expert) to process a given input token or example.
mixture of experts explained in plain English
A scheme to increase neural network efficiency by using only a subset of its parameters (known as an expert) to process a given input token or example. A gating network routes each input token or example to the proper expert(s). For details, see either of the following papers: - Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer - Mixture-of-Experts with Expert Choice Routing
Example
Practitioners refer to mixture of experts when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- generative AI
An emerging transformative field with no formal definition.
- Long Short-Term Memory
A type of cell in a recurrent neural network used to process sequences of data in applications such as handwriting recognition, machine translation, and image captioning.
- Neural Architecture Search
A technique for automatically designing the architecture of a neural network.
- pooling
Reducing a matrix (or matrixes) created by an earlier convolutional layer to a smaller matrix.
- Attention
A mechanism that lets a model focus on the most relevant parts of its input when producing an output, weighting what matters most in context.
- auto-regressive model
A model that infers a prediction based on its own previous predictions.
- autoencoder
A system that learns to extract the most important information from the input.
- automatic evaluation
Using software to judge the quality of a model's output.
- bag of words
A representation of the words in a phrase or passage, irrespective of order.
- BERT
A model architecture for text representation.