AIExplainer
Machine Learning Intermediate

multimodal instruction-tuned

An instruction-tuned model that can process input beyond text, such as images, video, and audio.

An instruction-tuned model that can process input beyond text, such as images, video, and audio.

Practitioners refer to multimodal instruction-tuned when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.