Models
Multimodal AI
Multimodal AI works with more than one data modality, such as text, images, audio, video, documents, or structured data.
Expanded definition
Multimodal AI systems can understand, generate, or transform information across modalities. Examples include visual question answering, image generation, speech transcription, document understanding, video analysis, and assistants that combine text with screenshots or files. Model capability varies by supported inputs, outputs, context length, and tool integrations.
Related terms
Explore adjacent ideas in the knowledge graph.
Related
Comparisons, tools, and models that connect to this idea.