GENAIWIKI

Meta

MetaVision-2024

multimodal · Release Aug 12, 2024 · commercial

A multimodal AI model that integrates vision and language for comprehensive understanding and interactive applications.

multimodalvisionlanguage
Updated today

Modalities

What goes in and what comes out.

Inputs

text, image

Outputs

text, image

Capabilities

image-text alignment, visual question answering, content generation

Benchmarks snapshot

Structured JSON for reproducible comparisons.

{}

Related on GenAIWiki

Same provider, tooling that cites the model, or prompts tuned for it.