OpenGVLab repositories

Vlaser

Public

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

Python

•

MIT License

•0•41•0•0•Updated

Feb 16, 2026

UMMEvalKit

Public

A unified, efficient, and extensible evaluation toolkit for unified multimodal models

Jupyter Notebook

•

MIT License

•1•5•0•0•Updated

Feb 12, 2026

VKnowU

Public

Python

•1•11•0•0•Updated

Feb 3, 2026

GenExam

Public

GenExam: A Multidisciplinary Text-to-Image Exam

benchmark image-generation text-to-image-generation

Python

•

MIT License

•4•56•0•0•Updated

Jan 29, 2026

MetaCaptioner

Public

Python

•3•44•1•0•Updated

Jan 27, 2026

ScaleCUA

Public

ScaleCUA is the open-sourced computer use agents that can operate on cross-platform environments (Windows, macOS, Ubuntu, Android).

data models gui-agentscomputer-use-agents scalecua online-evaluation-suite

Python

•

Apache License 2.0

•74•1.1k•14•0•Updated

Jan 7, 2026

GUI-Odyssey

Public

[ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 episodes from 6 mobile d…

Python

•8•147•10•0•Updated

Jan 3, 2026

SDLM

Public

Sequential Diffusion Language Model (SDLM) enhances pre-trained autoregressive language models by adaptively determining generation length and maintaining KV-ca…

gpt language-model diffusion-modelsllm

Python

•

MIT License

•3•90•0•0•Updated

Dec 27, 2025

InternVideo

Public

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

benchmark action-recognition video-understandingvideo-data self-supervised multimodal video-dataset open-set-recognition video-retrieval video-question-answering

Python

•

Apache License 2.0

•139•2.2k•134•4•Updated

Dec 15, 2025

SID-VLN

Public

Official implementation of: Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale

Python

•

MIT License

•2•11•0•0•Updated

Nov 29, 2025

vinci

Public

Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model

Python

•2•81•2•0•Updated

Nov 27, 2025

OmniQuant

Public

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

quantization large-language-models llm

Python

•

MIT License

•76•887•29•1•Updated

Nov 26, 2025

EfficientQAT

Public

[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Python

•27•327•13•0•Updated

Nov 26, 2025

VideoChat-Flash

Public

[ICLR2026] VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Python

•

MIT License

•16•507•11•0•Updated

Nov 18, 2025

ExpVid

Public

0•8•0•0•Updated

Oct 28, 2025

VideoChat-R1

Public

[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning

Python

•10•257•24•0•Updated

Oct 18, 2025

NaViL

Public

Python

•

MIT License

•7•89•0•0•Updated

Oct 10, 2025

PonderV2

Public

[T-PAMI 2025] PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

3d-vision pretraining foundation-models

Python

•

MIT License

•8•370•0•0•Updated

Sep 30, 2025

InternVL

Public

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

image-classification gpt multi-modalsemantic-segmentation video-classification image-text-retrieval llm vision-language-model gpt-4v vit-6b

Python

•

MIT License

•757•9.8k•297•11•Updated

Sep 22, 2025

EgoExoLearn

Public

[CVPR 2024] Data and benchmark code for the EgoExoLearn dataset

Python

•

MIT License

•2•79•3•0•Updated

Aug 26, 2025

VRBench

Public

[ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videos

benchmark dataset video-understandingvlm evaluation-kit multi-step-reasoning video-reasoning llm

Python

•

Apache License 2.0

•0•24•1•0•Updated

Aug 8, 2025

PIIP

Public

[NeurIPS 2024 Spotlight ⭐️ & TPAMI 2025] Parameter-Inverted Image Pyramid Networks (PIIP)

computer-vision image-classification object-detectionsemantic-segmentation instance-segmentation vision-transformer multimodal-large-language-models vision-language-models

Python

•

MIT License

•5•108•2•0•Updated

Aug 5, 2025

LORIS

Public

[ICML2023] Long-Term Rhythmic Video Soundtracker

music-generation pytorch-implementation multi-modalitydiffusion-models aigc

Python

•

MIT License

•1•62•1•0•Updated

Jul 28, 2025

TPO

Public

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Jupyter Notebook

•6•64•1•0•Updated

Jul 22, 2025

Docopilot

Public

[CVPR 2025] Docopilot: Improving Multimodal Models for Document-Level Understanding

Python

•

MIT License

•1•36•2•0•Updated

Jul 22, 2025

Mono-InternVL

Public

[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Python

•

MIT License

•0•103•7•0•Updated

Jul 18, 2025

ZeroGUI

Public

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Python

•

Apache License 2.0

•8•109•0•0•Updated

Jul 17, 2025

MUTR

Public

「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation

Python

•

MIT License

•7•82•3•0•Updated

Jun 13, 2025

PVC

Public

[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Python

•

MIT License

•2•51•4•0•Updated

Jun 12, 2025

FluxViT

Public

Make Your Training Flexible: Towards Deployment-Efficient Video Models

Python

•

MIT License

•0•37•1•0•Updated

Jun 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenGVLab

All

All

91 repositories

Vlaser

UMMEvalKit

VKnowU

GenExam

MetaCaptioner

ScaleCUA

GUI-Odyssey

SDLM

InternVideo

SID-VLN

vinci

OmniQuant

EfficientQAT

VideoChat-Flash

ExpVid

VideoChat-R1

NaViL

PonderV2

InternVL

EgoExoLearn

VRBench

PIIP

LORIS

TPO

Docopilot

Mono-InternVL

ZeroGUI

MUTR

PVC

FluxViT

All

All

Repositories list

91 repositories