qwen2-vl

Here are 59 public repositories matching this topic...

roboflow / maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

transformers vqa objectdetection captioning fine-tuning multimodal vision-and-language phi-3-vision paligemma florence-2 qwen2-vl

Updated Mar 9, 2026
Python

2U1 / Qwen-VL-Series-Finetune

Star

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

vlm multimodal vision-language vision-language-model qwen2-vl qwen2-5-vl qwen3-vl qwen3-5

Updated Mar 10, 2026
Python

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Updated Mar 6, 2026
Python

lucasjinreal / Crane

Star

A Pure Rust based LLM, VLM, VLA, TTS, OCR Inference Engine, powering by Candle & Rust. Alternate to your llama.cpp but much more simpler and cleaner..

rust mllm llama-cpp qwen2-vl spark-tts qwen3

Updated Feb 24, 2026
Rust

worldbench / DriveBench

Star

[ICCV 2025] Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

autonomous-driving chatgpt vision-language-models phi-3 internvl qwen2-vl driving-with-language

Updated Dec 12, 2025
Python

NetEase-Media / grps_trtllm

Star

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.

Updated Dec 8, 2025
Python

col14m / cadrille

Star

[ICLR2026] cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning

pytorch cad vlm cadquery llm qwen2-vl

Updated Feb 1, 2026
Python

arcstep / illufly

Star

✨🦋 illufly - 【幻蝶】基于记忆蒸馏、资料检索的自我进化智能体

agent ai growth openai multiagent gpt rag llm longtext qwen qwen2 dashscope glm-4 zhipu qwen2-vl illufly

Updated Dec 7, 2025
Python

soulteary / dify-with-qwen-vl

Star

视频理解：千问视频多模态模型 & Dify

dify qwen2 qwen2-vl

Updated Sep 2, 2024
Python

fireicewolf / wd-llm-caption-cli

Star

A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha,meta Llama 3.2 Vision Instruct and Qwen2 VL Instruct models.

image-caption wd14 llama3-vision florence-2 qwen2-vl joy-caption

Updated Oct 30, 2025
Python

Younis-Ahmed / qwen-ai-provider

Star

Community-built Qwen AI Provider for Vercel AI SDK - Integrate Alibaba Cloud's Qwen models with Vercel's AI application framework

ai artificial-intelligence language-model alibaba-cloud vercel generative-ai vercel-ai vercel-ai-sdk qwen qwen-api llm-integration qwen2-vl qwen2-5 ai-provider

Updated Oct 27, 2025
TypeScript

shaadclt / Qwen2-VL-OCR-VQA

Sponsor

Star

This project demonstrates how to use the Qwen2-VL model from Hugging Face for Optical Character Recognition (OCR) and Visual Question Answering (VQA). The model combines vision and language capabilities, enabling users to analyze images and generate context-based responses.

optical-character-recognition visual-question-answering qwen2-vl

Updated Oct 18, 2024
Jupyter Notebook

see2023 / autoXHS

Star

基于多模态大模型的智能搜索助手，通过AI技术实现小红书平台的智能化信息检索和知识整合|An intelligent search assistant based on multimodal large models, enabling smart information retrieval and knowledge integration on the Xiaohongshu platform.

spider selenium-webdriver xiaohongshu llm qwen2-vl

Updated Nov 6, 2024
Python

KhadgaA / Amazon-ML-Challenge

Star

This repo contains the winning code for Amazon ML Challenge 2024. The challenge was to develop a Machine Learning model to extract product entity details directly from the product images.

computer-vision vqa visual-question-answering amazon-ml-challenge vision-language-model llama-factory qwen2-vl

Updated Nov 30, 2024
Python

aws-samples / sample-for-multi-modal-document-to-json-with-sagemaker-ai

Star

This open-source project delivers a complete pipeline for converting multi-page documents (PDFs/images) into structured JSON using Vision LLMs on Amazon SageMaker. The solution leverages the SWIFT Framework to fine-tune models specifically for document understanding tasks.

swift aws llama idp document-processing fine-tuning multimodal sagemaker sft huggingface qwen2-vl

Updated Aug 4, 2025
Jupyter Notebook

BUAADreamer / Qwen2-VL-History

Star

Qwen2-VL在文旅领域的LLaMA-Factory微调案例 The case for fine-tuning Qwen2-VL in the field of historical literature and museums

beauty museum history supervised-finetuning mllm multimodal-large-language-models llama-factory qwen2-vl

Updated Sep 17, 2024

zhangguanghao523 / CMMCoT

Star

Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

mcot cot chain-of-thought mllm multimodel-large-language-model qwen2-vl qwen2-5-vl

Updated Dec 5, 2025
Python

PRITHIVSAKTHIUR / Qwen3-VL-Outpost

Star

Qwen3-VL-Outpost is a Gradio-based web application for vision-language tasks, leveraging multiple Qwen vision-language models to process images and videos.

torch gradio opencv-python video-understanding huggingface-transformers huggingface-spaces vision-language-model qwen2-vl qwen2-5-vl qwen3-vl

Updated Feb 13, 2026
Python

Valdanitooooo / chat_with_qwen2_vl_test

Star

qwen2-vl

Updated Dec 27, 2024
Python

ZachcZhang / Qwen2-VL-inference

Star

An open-source server implementation for inference Qwen2-VL series model using fastapi.

inference fastapi huggingface mllm qwen2-vl

Updated Nov 20, 2024
Python

Improve this page

Add a description, image, and links to the qwen2-vl topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the qwen2-vl topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen2-vl

Here are 59 public repositories matching this topic...

roboflow / maestro

2U1 / Qwen-VL-Series-Finetune

PaddlePaddle / PaddleMIX

lucasjinreal / Crane

worldbench / DriveBench

NetEase-Media / grps_trtllm

col14m / cadrille

arcstep / illufly

soulteary / dify-with-qwen-vl

fireicewolf / wd-llm-caption-cli

Younis-Ahmed / qwen-ai-provider

shaadclt / Qwen2-VL-OCR-VQA

see2023 / autoXHS

KhadgaA / Amazon-ML-Challenge

aws-samples / sample-for-multi-modal-document-to-json-with-sagemaker-ai

BUAADreamer / Qwen2-VL-History

zhangguanghao523 / CMMCoT

PRITHIVSAKTHIUR / Qwen3-VL-Outpost

Valdanitooooo / chat_with_qwen2_vl_test

ZachcZhang / Qwen2-VL-inference

Improve this page

Add this topic to your repo