Fix: Enable /v1/models endpoint for pure diffusion mode by majiayu000 · Pull Request #805 · vllm-project/vllm-omni

majiayu000 · 2026-01-15T15:02:00Z

Fixes #751. Initializes OpenAIServingModels in pure diffusion mode to ensure the /v1/models endpoint is correctly populated.

gcanlin

I think it's better to move the original OpenAIServingModels up.

vllm-omni/vllm_omni/entrypoints/openai/api_server.py

Lines 418 to 422 in 8a9ea75

    
           state.openai_serving_models = OpenAIServingModels( 
        
               engine_client=engine_client, 
        
               base_model_paths=base_model_paths, 
        
               lora_modules=lora_modules, 
        
           )

tzhouam · 2026-01-16T13:54:17Z

Please add test plan and results. Thanks

majiayu000 · 2026-01-28T16:45:58Z

Updated the PR to address reviewer feedback:

@gcanlin's suggestion: Consolidated initialization to eliminate code duplication. The initialization is now shared between pure diffusion mode and LLM mode, improving code maintainability.
@tzhouam's request for test plan: Added a comprehensive test plan in the commit message. Manual testing is required with a diffusion model to verify the /v1/models endpoint returns the model information correctly.

Changes Made

Moved OpenAIServingModels initialization before the is_pure_diffusion check
Both diffusion and LLM modes now use the same initialization code path
Properly handle lora_modules processing for both modes
DCO signoff included

The implementation properly fixes issue #751 by ensuring the /v1/models endpoint works in pure diffusion mode.

hsliuustc0106 · 2026-01-28T20:57:39Z

@fake0fan PTAL

Copilot

Pull request overview

This PR fixes issue #751 by enabling the /v1/models endpoint for pure diffusion mode. Previously, OpenAIServingModels was only initialized in LLM mode, causing the /v1/models endpoint to be unavailable when running pure diffusion models like Qwen-Image.

Changes:

Initialize OpenAIServingModels before the pure diffusion mode check to ensure it's available for both diffusion and LLM modes
Add state.args assignment for potential future use
Refactor lora_modules handling to support early initialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-28T21:02:32Z

vllm_omni/entrypoints/openai/api_server.py

+    # Initialize OpenAIServingModels (shared by both diffusion and LLM modes)
+    state.openai_serving_models = OpenAIServingModels(
+        engine_client=engine_client,
+        base_model_paths=base_model_paths,
+        lora_modules=lora_modules,
+    )


In LLM mode with default_mm_loras, OpenAIServingModels is initialized twice - once at lines 286-290 and again at lines 374-378. This creates an unnecessary object that will be immediately discarded. Consider initializing OpenAIServingModels only once after determining the final lora_modules value. For example, move the first initialization to after the lora_modules processing logic (after line 378) and use conditional logic to determine whether to use args.lora_modules or the merged lora_modules.

Suggested change

# Initialize OpenAIServingModels (shared by both diffusion and LLM modes)

state.openai_serving_models = OpenAIServingModels(

engine_client=engine_client,

base_model_paths=base_model_paths,

lora_modules=lora_modules,

)

# Initialize OpenAIServingModels early only for pure diffusion mode.

# In LLM/multi-stage modes, OpenAIServingModels will be initialized later

# after any lora_modules processing/merging is complete.

if is_pure_diffusion:

state.openai_serving_models = OpenAIServingModels(

engine_client=engine_client,

base_model_paths=base_model_paths,

lora_modules=lora_modules,

)

Copilot · 2026-01-28T21:02:33Z

vllm_omni/entrypoints/openai/api_server.py

    base_model_paths = [BaseModelPath(name=name, model_path=args.model) for name in served_model_names]
    state.engine_client = engine_client
    state.log_stats = not args.disable_log_stats
+    state.args = args


The assignment state.args = args appears to be unused. There are no other references to state.args in the codebase. If this is intended for future use or debugging purposes, consider adding a comment explaining why it's being stored. Otherwise, this line can be removed.

Suggested change

state.args = args

Copilot · 2026-01-28T21:02:33Z

vllm_omni/entrypoints/openai/api_server.py

+    # Process lora_modules early for OpenAIServingModels initialization
+    # In pure diffusion mode, vllm_config will be None, so we use args.lora_modules directly
+    lora_modules = args.lora_modules
+
+    # Initialize OpenAIServingModels (shared by both diffusion and LLM modes)
+    state.openai_serving_models = OpenAIServingModels(
+        engine_client=engine_client,
+        base_model_paths=base_model_paths,
+        lora_modules=lora_modules,


The variable lora_modules is assigned to args.lora_modules at line 283, but in LLM mode it's reassigned to the same value at line 359. The assignment at line 283 is used for pure diffusion mode (line 289) and the assignment at line 359 is used for LLM mode. Consider refactoring to avoid this redundancy - for example, by only setting lora_modules once before the diffusion mode check, or by using a different variable name for the LLM-specific processing.

Suggested change

# Process lora_modules early for OpenAIServingModels initialization

# In pure diffusion mode, vllm_config will be None, so we use args.lora_modules directly

lora_modules = args.lora_modules

# Initialize OpenAIServingModels (shared by both diffusion and LLM modes)

state.openai_serving_models = OpenAIServingModels(

engine_client=engine_client,

base_model_paths=base_model_paths,

lora_modules=lora_modules,

# Initialize OpenAIServingModels (shared by both diffusion and LLM modes)

# In pure diffusion mode, vllm_config will be None, so we pass args.lora_modules directly

state.openai_serving_models = OpenAIServingModels(

engine_client=engine_client,

base_model_paths=base_model_paths,

lora_modules=args.lora_modules,

fake0fan

#454 has already been merged, and I see that it also contains /v1/models/ API tests for diffusion models. What is the relationship between that and this? Could you explain it simply?

fake0fan · 2026-01-29T02:23:13Z

vllm_omni/entrypoints/openai/api_server.py

+            engine_client=engine_client,
+            base_model_paths=base_model_paths,
+            lora_modules=lora_modules,
+        )


I don't understand why we need to reinitialize OpenAIServingModels with merged lora_modules? Can we solve this problem by moving lora_modules up as well?

To clarify, this is not a re-initialization. Since we use omni_run_server_worker as the entry point, the upstream vLLM init_app_state is never called.

lishunyang12 · 2026-02-21T08:00:22Z

@majiayu000 Hey, this fixes #751 and it's only 20 lines — initializing OpenAIServingModels in pure diffusion mode so /v1/models works. Seems like a straightforward fix. Is there anything blocking this?

Gaohan123 · 2026-02-26T06:37:17Z

Please fix precommit. Thanks

majiayu000 · 2026-02-26T08:14:35Z

Thanks for the PR. I've formatted the code, resolved the conflicts with PR #454, and restored and adapted the tests for the unified OpenAIServingModels in pure diffusion mode. The changes have been pushed to this branch.

Gaohan123 · 2026-02-26T09:53:06Z

Please resolve CI failure

Gaohan123 · 2026-02-27T16:23:25Z

Please fix DCO

…roject#751) Signed-off-by: majiayu000 <1835304752@qq.com>

Signed-off-by: majiayu000 <1835304752@qq.com>

The previous commit removed the custom /v1/models handler from the omni router but the upstream vLLM route was still being removed in init_app, leaving no /v1/models endpoint at all. Re-add a simplified handler that delegates to state.openai_serving_models (either OpenAIServingModels for LLM mode or _DiffusionServingModels for pure diffusion mode). Signed-off-by: majiayu000 <1835304752@qq.com>

majiayu000 · 2026-02-28T05:54:37Z

Fixed

majiayu000 requested a review from hsliuustc0106 as a code owner January 15, 2026 15:02

majiayu000 force-pushed the fix/issue-751-omni-v1-models branch from 09b8c80 to 8a9ea75 Compare January 15, 2026 16:34

gcanlin reviewed Jan 16, 2026

View reviewed changes

tzhouam added the ready label to trigger buildkite CI label Jan 16, 2026

majiayu000 force-pushed the fix/issue-751-omni-v1-models branch from 8a9ea75 to 9dfea2d Compare January 28, 2026 16:45

hsliuustc0106 removed the ready label to trigger buildkite CI label Jan 28, 2026

hsliuustc0106 requested review from Copilot and david6666666 January 28, 2026 20:57

Copilot started reviewing on behalf of hsliuustc0106 January 28, 2026 20:57 View session

Copilot AI reviewed Jan 28, 2026

View reviewed changes

fake0fan reviewed Jan 29, 2026

View reviewed changes

majiayu000 force-pushed the fix/issue-751-omni-v1-models branch from 9dfea2d to 3aab7be Compare February 22, 2026 09:04

Gaohan123 added the ready label to trigger buildkite CI label Feb 26, 2026

majiayu000 added 2 commits February 28, 2026 00:44

Fix: Enable /v1/models endpoint for pure diffusion mode (Issue vllm-p…

b816170

…roject#751) Signed-off-by: majiayu000 <1835304752@qq.com>

test: Restore /v1/models tests for OpenAIServingModels

c5da5a7

Signed-off-by: majiayu000 <1835304752@qq.com>

majiayu000 force-pushed the fix/issue-751-omni-v1-models branch from e650c47 to c4d6e9f Compare February 27, 2026 16:45

majiayu000 force-pushed the fix/issue-751-omni-v1-models branch from c4d6e9f to c2bf6b5 Compare February 28, 2026 01:38

Merge branch 'main' into fix/issue-751-omni-v1-models

f75a08f

	state.openai_serving_models = OpenAIServingModels(
	engine_client=engine_client,
	base_model_paths=base_model_paths,
	lora_modules=lora_modules,
	)

Conversation

majiayu000 commented Jan 15, 2026

Uh oh!

gcanlin left a comment

Choose a reason for hiding this comment

Uh oh!

tzhouam commented Jan 16, 2026

Uh oh!

majiayu000 commented Jan 28, 2026

Changes Made

Uh oh!

hsliuustc0106 commented Jan 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

fake0fan left a comment

Choose a reason for hiding this comment

Uh oh!

fake0fan Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

majiayu000 Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 commented Feb 21, 2026

Uh oh!

Gaohan123 commented Feb 26, 2026

Uh oh!

majiayu000 commented Feb 26, 2026

Uh oh!

Gaohan123 commented Feb 26, 2026

Uh oh!

Gaohan123 commented Feb 27, 2026

Uh oh!

majiayu000 commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants