Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
config.json	config.json
deploy.yaml	deploy.yaml

Llama 3.2 90B-Vision-Instruct Model

The Llama 3.2 90B-Vision-Instruct model is a vision-based version of the Llama 3.2 model, designed to be highly capable with visual reasoning and instruction following abilities. This model is ideal for building personalized, on-device agentic applications with strong privacy, where data never leaves the device.

Features

Highly capable with visual reasoning and instruction following abilities
Supports image understanding and visual grounding tasks
Optimized for edge and mobile devices
Supports context length of 128K tokens
Available for fine-tuning and deployment on a variety of platforms
Part of the Llama 3.2 ecosystem, providing seamless integration with other Llama models

Technical Specifications

Model size: 90B parameters
Context length: 128K tokens
Input type: Text and image
Output type: Text and image
Pre-trained on: Large-scale noisy (text, image) pair data
Fine-tuned on: Medium-scale high-quality in-domain and knowledge-enhanced (text, image) pair data
Weights: Based on BFloat16 numerics
Quantized variants: Currently in development

Performance Metrics

Competitive with leading foundation models on image recognition and visual understanding tasks
Outperforms Gemma 2 2.6B and Phi 3.5-mini models on tasks such as following instructions, visual grounding, and image captioning
Competitive with Gemma 2 2.6B model on tasks such as visual reasoning and image captioning

Use Cases

Personalized on-device agentic applications with strong privacy
Visual reasoning and instruction following
Image understanding and visual grounding
Image captioning and generation
Multimodal text and image generation

Serve with vLLM

vLLM is a fast and easy-to-use library for LLM inference and serving.

vLLM is fast with:

State-of-the-art serving throughput
Efficient management of attention key and value memory with PagedAttention
Continuous batching of incoming requests
Fast model execution with CUDA/HIP graph
Quantization: GPTQ, AWQ, SqueezeLLM, FP8 KV Cache
Optimized CUDA kernels

vLLM is flexible and easy to use with:

Seamless integration with popular Hugging Face models
High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more
Tensor parallelism support for distributed inference
Streaming outputs
OpenAI-compatible API server
Support NVIDIA GPUs and AMD GPUs
(Experimental) Prefix caching support
(Experimental) Multi-lora support

vLLM seamlessly supports many Hugging Face models, including the following architectures:

Aquila & Aquila2 (BAAI/AquilaChat2-7B, BAAI/AquilaChat2-34B, BAAI/Aquila-7B, BAAI/AquilaChat-7B, etc.)
Baichuan & Baichuan2 (baichuan-inc/Baichuan2-13B-Chat, baichuan-inc/Baichuan-7B, etc.)
BLOOM (bigscience/bloom, bigscience/bloomz, etc.)
ChatGLM (THUDM/chatglm2-6b, THUDM/chatglm3-6b, etc.)
Command-R (CohereForAI/c4ai-command-r-v01, etc.)
DBRX (databricks/dbrx-base, databricks/dbrx-instruct etc.)
DeciLM (Deci/DeciLM-7B, Deci/DeciLM-7B-instruct, etc.)
Falcon (tiiuae/falcon-7b, tiiuae/falcon-40b, tiiuae/falcon-rw-7b, etc.)
Gemma (google/gemma-2b, google/gemma-7b, etc.)
GPT-2 (gpt2, gpt2-xl, etc.)
GPT BigCode (bigcode/starcoder, bigcode/gpt_bigcode-santacoder, etc.)
GPT-J (EleutherAI/gpt-j-6b, nomic-ai/gpt4all-j, etc.)
GPT-NeoX (EleutherAI/gpt-neox-20b, databricks/dolly-v2-12b, stabilityai/stablelm-tuned-alpha-7b, etc.)
InternLM (internlm/internlm-7b, internlm/internlm-chat-7b, etc.)
InternLM2 (internlm/internlm2-7b, internlm/internlm2-chat-7b, etc.)
Jais (core42/jais-13b, core42/jais-13b-chat, core42/jais-30b-v3, core42/jais-30b-chat-v3, etc.)
LLaMA, Llama 2, and Meta Llama 3 (meta-llama/Meta-Llama-3-8B-Instruct, meta-llama/Meta-Llama-3-70B-Instruct, meta-llama/Llama-2-70b-hf, lmsys/vicuna-13b-v1.3, young-geng/koala, openlm-research/open_llama_13b, etc.)
MiniCPM (openbmb/MiniCPM-2B-sft-bf16, openbmb/MiniCPM-2B-dpo-bf16, etc.)
Mistral (mistralai/Mistral-7B-v0.1, mistralai/Mistral-7B-Instruct-v0.1, etc.)
Mixtral (mistralai/Mixtral-8x7B-v0.1, mistralai/Mixtral-8x7B-Instruct-v0.1, mistral-community/Mixtral-8x22B-v0.1, etc.)
MPT (mosaicml/mpt-7b, mosaicml/mpt-30b, etc.)
OLMo (allenai/OLMo-1B-hf, allenai/OLMo-7B-hf, etc.)
OPT (facebook/opt-66b, facebook/opt-iml-max-30b, etc.)
Orion (OrionStarAI/Orion-14B-Base, OrionStarAI/Orion-14B-Chat, etc.)
Phi (microsoft/phi-1_5, microsoft/phi-2, etc.)
Phi-3 (microsoft/Phi-3-mini-4k-instruct, microsoft/Phi-3-mini-128k-instruct, etc.)
Qwen (Qwen/Qwen-7B, Qwen/Qwen-7B-Chat, etc.)
Qwen2 (Qwen/Qwen1.5-7B, Qwen/Qwen1.5-7B-Chat, etc.)
Qwen2MoE (Qwen/Qwen1.5-MoE-A2.7B, Qwen/Qwen1.5-MoE-A2.7B-Chat, etc.)
StableLM(stabilityai/stablelm-3b-4e1t, stabilityai/stablelm-base-alpha-7b-v2, etc.)
Starcoder2(bigcode/starcoder2-3b, bigcode/starcoder2-7b, bigcode/starcoder2-15b, etc.)
Xverse (xverse/XVERSE-7B-Chat, xverse/XVERSE-13B-Chat, xverse/XVERSE-65B-Chat, etc.)
Yi (01-ai/Yi-6B, 01-ai/Yi-34B, etc.)

Getting Started

Visit our documentation to get started.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Llama 3.2 90B-Vision-Instruct Model

Features

Technical Specifications

Performance Metrics

Use Cases

Serve with vLLM

Getting Started

FilesExpand file tree

Llama-3.2-90B-Vision-Instruct

Directory actions

More options

Directory actions

More options

Latest commit

History

Llama-3.2-90B-Vision-Instruct

Folders and files

parent directory

README.md

Llama 3.2 90B-Vision-Instruct Model

Features

Technical Specifications

Performance Metrics

Use Cases

Serve with vLLM

Getting Started