AIExplainer
Machine Learning Intermediate

multimodal model

A model whose inputs, outputs, or both include more than one modality.

A model whose inputs, outputs, or both include more than one modality. For example, consider a model that takes both an image and a text caption (two modalities) as features, and outputs a score indicating how appropriate the text caption is for the image. So, this model's inputs are multimodal and the output is unimodal.

Practitioners refer to multimodal model when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.