What is a model parallelism?
A way of scaling training or inference that puts different parts of one model on different devices.
model parallelism explained in plain English
A way of scaling training or inference that puts different parts of one model on different devices. Model parallelism enables models that are too big to fit on a single device. To implement model parallelism, a system typically does the following: 1. Shards (divides) the model into smaller parts. 2. Distributes the training of those smaller parts across multiple processors. Each processor trains its own part of the model. 3. Combines the results to create a single model. Model parallelism slows training. See also data parallelism.
Example
Practitioners refer to model parallelism when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- A/B testing
A statistical way of comparing two (or more) techniques—the A and the B.
- ablation
A technique for evaluating the importance of a feature or component by temporarily removing it from a model.
- accuracy
The number of correct classification predictions divided by the total number of predictions.
- activation function
A function that enables neural networks to learn nonlinear (complex) relationships between features and the label.
- active learning
A training approach in which the algorithm chooses some of the data it learns from.
- adaptation
Synonym for tuning or fine-tuning.
- agglomerative clustering
See hierarchical clustering.
- anomaly detection
The process of identifying outliers.
- area under the PR curve
See PR AUC (Area under the PR Curve).
- area under the ROC curve
See AUC (Area under the ROC curve).