model parallelism

A way of scaling training or inference that puts different parts of one model on different devices.

Plain English Explanation

A way of scaling training or inference that puts different parts of one model on different devices. Model parallelism enables models that are too big to fit on a single device. To implement model parallelism, a system typically does the following: 1. Shards (divides) the model into smaller parts. 2. Distributes the training of those smaller parts across multiple processors. Each processor trains its own part of the model. 3. Combines the results to create a single model. Model parallelism slows training. See also data parallelism.

How is it used?

Practitioners refer to model parallelism when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.