VITA-MLLM

All

9 repositories

VITA-QinYu
Public
Python
•
Other
•0•2•0•0•Updated Mar 24, 2026Mar 24, 2026
Omni-Diffusion
Public
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
Python
•2•112•2•0•Updated Mar 12, 2026Mar 12, 2026
Freeze-Omni
Public
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
speech speech-synthesis speech-recognition
speech speech-synthesis speech-recognition speech-to-speech large-language-models multimodal-large-language-models
Python
•
Other
•25•374•14•2•Updated May 27, 2025May 27, 2025
VITA-Audio
Public
✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Python
•
Other
•61•677•28•0•Updated May 24, 2025May 24, 2025
Long-VITA
Public
✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
long-context mllm vision-language-model
long-context mllm vision-language-model
Python
•
Other
•29•306•6•0•Updated May 14, 2025May 14, 2025
LUCY
Public
LUCY: Linguistic Understanding and Control Yielding Early Stage of Her
Python
•
Other
•3•60•12•0•Updated Apr 14, 2025Apr 14, 2025
Sparrow
Public
Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation
Jupyter Notebook
•
Apache License 2.0
•1•31•0•0•Updated Mar 28, 2025Mar 28, 2025
VITA
Public
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
multimodal-large-language-models large-multimodal-models omni-modal-video-understanding
multimodal-large-language-models large-multimodal-models omni-modal-video-understanding omni-language-model omni-model
Python
•
Other
•183•2.5k•58•1•Updated Mar 28, 2025Mar 28, 2025
Woodpecker
Public
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
multimodality hallucination hallucinations
multimodality hallucination hallucinations large-language-models llm mllm multimodal-large-language-models
Python
•30•649•2•0•Updated Dec 23, 2024Dec 23, 2024