This repository provides ComfyUI nodes that wrap the latest vision-language and language-only checkpoints from the Qwen family. Both Qwen3 VL and Qwen2.5 VL models are supported for multimodal reasoning, alongside text-only Qwen2.5/Qwen3 models for prompt generation.
- Custom Model Support: Load any Qwen2.5/Qwen3 compatible model from HuggingFace via
custom_models.json - Multi-Image Input: QwenVL node now supports up to 4 image inputs for multi-image reasoning
- Bypass Mode: Skip model inference and pass text directly through for workflow debugging
- Increased Token Limit: Max tokens increased to 8000 for longer outputs
- System Prompt: QwenVL node now exposes system prompt for custom instructions
- Added support for the Qwen3 VL family (
Qwen3-VL-4B-Thinking,Qwen3-VL-8B-Thinking, etc.) - Retained compatibility with existing Qwen2.5 VL models
- Text-only workflows support both Qwen2.5 and Qwen3 instruct checkpoints
Attention Mode Options: auto, sdpa, flash_attention_2, eager
| Mode | Recommendation |
|---|---|
sdpa |
Recommended - Best compatibility and performance on most GPUs |
flash_attention_2 |
Fastest on SM 80-90 GPUs (RTX 30/40 series). Requires pip install flash-attn |
eager |
Fallback for compatibility issues |
auto |
Defaults to SDPA |
Quantization Notes:
- Non-quantized with SDPA: ~17 tokens/sec on RTX 5090
- 4-bit quantization: May be slower on newer GPUs (Blackwell) due to kernel compatibility
- Multimodal workflow example:
workflow/Qwen2VL.json - Text generation workflow example:
workflow/qwen25.json
You can install through ComfyUI Manager (search for Qwen-VL wrapper for ComfyUI) or manually:
-
Clone the repository:
git clone https://github.com/alexcong/ComfyUI_QwenVL.git
-
Change into the project directory:
cd ComfyUI_QwenVL -
Install dependencies (ensure you are inside your ComfyUI virtual environment if you use one):
pip install -r requirements.txt
Note: This only installs a minimal set of dependencies that aren't already provided by ComfyUI:
huggingface_hub- For downloading modelsaccelerate- For efficient model loading with quantizationqwen-vl-utils- Qwen vision-language utilitiesbitsandbytes- Quantization support (4bit/8bit)
Core dependencies like PyTorch, Transformers, NumPy, and Pillow are already included with ComfyUI.
Multimodal generation with Qwen3 VL and Qwen2.5 VL checkpoints.
Inputs:
system– System prompt for model instructions (default: "You are a helpful assistant.")text– User prompt/questionimage1,image2,image3,image4– Up to 4 optional image inputs (processed in order)video_path– Optional video inputmodel– Select from built-in or custom modelsquantization– none/4-bit/8-bit for memory optimizationkeep_model_loaded– Cache model between runsbypass– Pass text directly to output without inferencetemperature– Sampling temperature (0-1)max_new_tokens– Maximum output tokens (up to 8000)seed– Manual seed for reproducibility (-1 for random)
Text-only generation backed by Qwen2.5/Qwen3 instruct models.
Inputs:
system– System promptprompt– User promptmodel– Select from built-in or custom modelsquantization– none/4-bit/8-bitkeep_model_loaded– Cache model between runsbypass– Pass prompt directly to output without inferencetemperature,max_new_tokens,seed– Generation parameters
You can add any Qwen2.5 or Qwen3 compatible model from HuggingFace by editing the custom_models.json file in this directory.
Edit custom_models.json with the following structure:
{
"hf_models": {
"My-Custom-Model-Name": {
"repo_id": "username/model-repo-name",
"model_class": "Qwen3",
"default": false,
"quantized": false,
"vram_requirement": {
"4bit": 4,
"8bit": 6,
"full": 12
}
}
}
}| Field | Required | Description |
|---|---|---|
repo_id |
Yes | HuggingFace repository ID (e.g., "coder3101/Qwen3-VL-2B-Instruct-heretic") |
model_class |
No | Force model architecture: "Qwen3" or "Qwen2.5". Auto-detected from name if omitted. |
default |
No | Reserved for future use |
quantized |
No | Informational: whether the model is pre-quantized |
vram_requirement |
No | Informational: estimated VRAM usage in GB for each quantization level |
- VL (Vision-Language) models: Names containing
-VL-are added to the QwenVL node dropdown - Text-only models: All other models are added to the Qwen node dropdown
- Model class: Auto-detected from name (
Qwen3if name contains "Qwen3", otherwiseQwen2.5)
{
"hf_models": {
"Qwen3-VL-2B-Instruct-Heretic": {
"repo_id": "coder3101/Qwen3-VL-2B-Instruct-heretic"
},
"My-Qwen3-8B-Finetune": {
"repo_id": "myusername/my-qwen3-8b-finetune",
"model_class": "Qwen3"
}
}
}Models are automatically downloaded from HuggingFace on first use and stored in ComfyUI/models/LLM/.
Downloaded models are stored under ComfyUI/models/LLM/.
Both nodes include a bypass toggle. When enabled:
- QwenVL: The
textinput is passed directly to the output without model inference - Qwen: The
promptinput is passed directly to the output without model inference
This is useful for workflow debugging or when you want to conditionally skip expensive model calls.

