Meta

MetaVision-2024

multimodal · Release Aug 12, 2024 · commercial

A multimodal AI model that integrates vision and language for comprehensive understanding and interactive applications.

multimodalvisionlanguage

Updated today

Modalities

What goes in and what comes out.

Inputs

text, image

Outputs

text, image

Capabilities

image-text alignment, visual question answering, content generation

Benchmarks snapshot

Structured JSON for reproducible comparisons.

{}

Related on GenAIWiki

Same provider, tooling that cites the model, or prompts tuned for it.

Meta

LLaMA 3 70B

LLaMA 3 70B features 70 billion parameters and a context window of 32k tokens, optimized for high-performance text generation and understanding across diverse tasks.

Meta

LLaMA 3 8B

LLaMA 3 8B is a compact model with 8 billion parameters, designed for efficient text generation and understanding with a context window of 8k tokens.

Meta

Llama 3.1 405B Instruct

Large open-weights instruct model competitive on reasoning and coding benchmarks with permissive licensing for customization.