Machine Learning Intermediate 1 min read

What is a multimodal model?

A model whose inputs, outputs, or both include more than one modality.

multimodal model explained in plain English

A model whose inputs, outputs, or both include more than one modality. For example, consider a model that takes both an image and a text caption (two modalities) as features, and outputs a score indicating how appropriate the text caption is for the image. So, this model's inputs are multimodal and the output is unimodal.

Example

Practitioners refer to multimodal model when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.

multimodal model explained in plain English

Example

People also read