[Qwen3-VL] GRPO with vLLM: How to handle unmerged Vision LoRA adapters? (fast_inference=True) #3922

hasso5703 · 2026-01-22T14:46:39Z

hasso5703
Jan 22, 2026

Hi everyone,

I am currently moving from SFT to Reinforcement Learning (GRPO) on Qwen3-VL-8B-Instruct using Unsloth. I have a specific constraint regarding vLLM compatibility and Vision LoRAs.

Context: The SFT Stage
I successfully performed SFT using 16bit LoRA with vision layers enabled.
Here is my SFT configuration:

model, tokenizer = FastVisionModel.from_pretrained(
    "unsloth/Qwen3-VL-8B-Instruct",
    load_in_4bit=False,
    load_in_8bit=False,
    full_finetuning=False,
    use_gradient_checkpointing="unsloth",
    max_seq_length=MAX_SEQ_LENGTH,
)

model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers     = True, # <--- Key point: Vision layers were trained
    finetune_language_layers   = True,
    finetune_attention_modules = True,
    finetune_mlp_modules       = True,
    r = 128,
    lora_alpha = 128,
    lora_dropout = 0.05,
    bias = "none",
    random_state = 2405,
    use_rslora = True,
    loftq_config = None,
    max_seq_length=MAX_SEQ_LENGTH
)

The Goal: GRPO with vLLM
I am now initializing the GRPO trainer starting from my best SFT checkpoint (step-175). I want to leverage vLLM (fast_inference=True) to speed up the generation phase of GRPO.

model, tokenizer = FastVisionModel.from_pretrained(
    "outputs/checkpoint-175",
    load_in_4bit=False, # Keeping 16bit/BF16
    fast_inference=False, # I WANT to set this to True
    max_seq_length=MAX_SEQ_LENGTH,
    gpu_memory_utilization=0.8,
)

The Problem & Constraints
I understand that vLLM does not currently support LoRA for vision/encoder layers. However, since I finetuned those layers during SFT, they are part of my adapter.

I explicitly do not want to merge the LoRA adapter into the base model yet, as I have observed precision discrepancies after merging.

My Questions:

Is it possible to disable/freeze the vision layers exclusively for the GRPO stage so that vLLM can handle the model?
If I set fast_inference=True with my current checkpoint, will vLLM simply ignore the vision weights in the adapter (and fallback to base vision weights), or will it crash?
What is the recommended approach to keep the SFT vision improvements active during GRPO generation without merging?

Any guidance on how to configure the PeftModel or Unsloth for this specific scenario would be greatly appreciated.

Thanks!

xXMrNidaXx · 2026-02-23T13:32:51Z

xXMrNidaXx
Feb 23, 2026

Vision LoRA with vLLM is tricky — the vision encoder and language model have different requirements.

The issue:

fast_inference=True expects merged weights
Vision LoRA adapters are separate from text LoRA
vLLM needs both merged for optimal serving

Workarounds:

1. Two-stage merge

# Merge text LoRA first
text_merged = model.merge_and_unload(text_lora)

# Then merge vision separately
full_merged = merge_vision_adapter(text_merged, vision_lora)

2. Serve without fast_inference

model = FastModel.from_pretrained(
    base_model,
    fast_inference=False,  # Slower but works with unmerged
    vision_lora=vision_adapter_path,
    text_lora=text_adapter_path
)

3. Export to separate checkpoints

Merge everything to a full checkpoint
Load that checkpoint in vLLM directly

Root cause: vLLM's adapter loading assumes single LoRA, not multimodal splits.

We've deployed VLMs with LoRA at RevolutionAI. The full merge approach is most reliable for production.

What's your serving setup — vLLM directly or through another layer?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Qwen3-VL] GRPO with vLLM: How to handle unmerged Vision LoRA adapters? (fast_inference=True) #3922

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

[Qwen3-VL] GRPO with vLLM: How to handle unmerged Vision LoRA adapters? (fast_inference=True) #3922

Uh oh!

hasso5703 Jan 22, 2026

Replies: 1 comment

Uh oh!

xXMrNidaXx Feb 23, 2026

hasso5703
Jan 22, 2026

xXMrNidaXx
Feb 23, 2026