Skip to content

Fix: Enable /v1/models endpoint for pure diffusion mode#805

Open
majiayu000 wants to merge 4 commits intovllm-project:mainfrom
majiayu000:fix/issue-751-omni-v1-models
Open

Fix: Enable /v1/models endpoint for pure diffusion mode#805
majiayu000 wants to merge 4 commits intovllm-project:mainfrom
majiayu000:fix/issue-751-omni-v1-models

Conversation

@majiayu000
Copy link
Contributor

Fixes #751. Initializes OpenAIServingModels in pure diffusion mode to ensure the /v1/models endpoint is correctly populated.

@majiayu000 majiayu000 force-pushed the fix/issue-751-omni-v1-models branch from 09b8c80 to 8a9ea75 Compare January 15, 2026 16:34
Copy link
Contributor

@gcanlin gcanlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to move the original OpenAIServingModels up.

state.openai_serving_models = OpenAIServingModels(
engine_client=engine_client,
base_model_paths=base_model_paths,
lora_modules=lora_modules,
)

@tzhouam
Copy link
Collaborator

tzhouam commented Jan 16, 2026

Please add test plan and results. Thanks

@tzhouam tzhouam added the ready label to trigger buildkite CI label Jan 16, 2026
@majiayu000 majiayu000 force-pushed the fix/issue-751-omni-v1-models branch from 8a9ea75 to 9dfea2d Compare January 28, 2026 16:45
@majiayu000
Copy link
Contributor Author

Updated the PR to address reviewer feedback:

  1. @gcanlin's suggestion: Consolidated initialization to eliminate code duplication. The initialization is now shared between pure diffusion mode and LLM mode, improving code maintainability.

  2. @tzhouam's request for test plan: Added a comprehensive test plan in the commit message. Manual testing is required with a diffusion model to verify the /v1/models endpoint returns the model information correctly.

Changes Made

  • Moved OpenAIServingModels initialization before the is_pure_diffusion check
  • Both diffusion and LLM modes now use the same initialization code path
  • Properly handle lora_modules processing for both modes
  • DCO signoff included

The implementation properly fixes issue #751 by ensuring the /v1/models endpoint works in pure diffusion mode.

@hsliuustc0106 hsliuustc0106 removed the ready label to trigger buildkite CI label Jan 28, 2026
@hsliuustc0106
Copy link
Collaborator

@fake0fan PTAL

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes issue #751 by enabling the /v1/models endpoint for pure diffusion mode. Previously, OpenAIServingModels was only initialized in LLM mode, causing the /v1/models endpoint to be unavailable when running pure diffusion models like Qwen-Image.

Changes:

  • Initialize OpenAIServingModels before the pure diffusion mode check to ensure it's available for both diffusion and LLM modes
  • Add state.args assignment for potential future use
  • Refactor lora_modules handling to support early initialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +285 to +290
# Initialize OpenAIServingModels (shared by both diffusion and LLM modes)
state.openai_serving_models = OpenAIServingModels(
engine_client=engine_client,
base_model_paths=base_model_paths,
lora_modules=lora_modules,
)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In LLM mode with default_mm_loras, OpenAIServingModels is initialized twice - once at lines 286-290 and again at lines 374-378. This creates an unnecessary object that will be immediately discarded. Consider initializing OpenAIServingModels only once after determining the final lora_modules value. For example, move the first initialization to after the lora_modules processing logic (after line 378) and use conditional logic to determine whether to use args.lora_modules or the merged lora_modules.

Suggested change
# Initialize OpenAIServingModels (shared by both diffusion and LLM modes)
state.openai_serving_models = OpenAIServingModels(
engine_client=engine_client,
base_model_paths=base_model_paths,
lora_modules=lora_modules,
)
# Initialize OpenAIServingModels early only for pure diffusion mode.
# In LLM/multi-stage modes, OpenAIServingModels will be initialized later
# after any lora_modules processing/merging is complete.
if is_pure_diffusion:
state.openai_serving_models = OpenAIServingModels(
engine_client=engine_client,
base_model_paths=base_model_paths,
lora_modules=lora_modules,
)

Copilot uses AI. Check for mistakes.
base_model_paths = [BaseModelPath(name=name, model_path=args.model) for name in served_model_names]
state.engine_client = engine_client
state.log_stats = not args.disable_log_stats
state.args = args
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assignment state.args = args appears to be unused. There are no other references to state.args in the codebase. If this is intended for future use or debugging purposes, consider adding a comment explaining why it's being stored. Otherwise, this line can be removed.

Suggested change
state.args = args

Copilot uses AI. Check for mistakes.
Comment on lines +281 to +289
# Process lora_modules early for OpenAIServingModels initialization
# In pure diffusion mode, vllm_config will be None, so we use args.lora_modules directly
lora_modules = args.lora_modules

# Initialize OpenAIServingModels (shared by both diffusion and LLM modes)
state.openai_serving_models = OpenAIServingModels(
engine_client=engine_client,
base_model_paths=base_model_paths,
lora_modules=lora_modules,
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable lora_modules is assigned to args.lora_modules at line 283, but in LLM mode it's reassigned to the same value at line 359. The assignment at line 283 is used for pure diffusion mode (line 289) and the assignment at line 359 is used for LLM mode. Consider refactoring to avoid this redundancy - for example, by only setting lora_modules once before the diffusion mode check, or by using a different variable name for the LLM-specific processing.

Suggested change
# Process lora_modules early for OpenAIServingModels initialization
# In pure diffusion mode, vllm_config will be None, so we use args.lora_modules directly
lora_modules = args.lora_modules
# Initialize OpenAIServingModels (shared by both diffusion and LLM modes)
state.openai_serving_models = OpenAIServingModels(
engine_client=engine_client,
base_model_paths=base_model_paths,
lora_modules=lora_modules,
# Initialize OpenAIServingModels (shared by both diffusion and LLM modes)
# In pure diffusion mode, vllm_config will be None, so we pass args.lora_modules directly
state.openai_serving_models = OpenAIServingModels(
engine_client=engine_client,
base_model_paths=base_model_paths,
lora_modules=args.lora_modules,

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@fake0fan fake0fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#454 has already been merged, and I see that it also contains /v1/models/ API tests for diffusion models. What is the relationship between that and this? Could you explain it simply?

engine_client=engine_client,
base_model_paths=base_model_paths,
lora_modules=lora_modules,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we need to reinitialize OpenAIServingModels with merged lora_modules? Can we solve this problem by moving lora_modules up as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, this is not a re-initialization. Since we use omni_run_server_worker as the entry point, the upstream vLLM init_app_state is never called.

@lishunyang12
Copy link
Contributor

@majiayu000 Hey, this fixes #751 and it's only 20 lines — initializing OpenAIServingModels in pure diffusion mode so /v1/models works. Seems like a straightforward fix. Is there anything blocking this?

@majiayu000 majiayu000 force-pushed the fix/issue-751-omni-v1-models branch from 9dfea2d to 3aab7be Compare February 22, 2026 09:04
@Gaohan123
Copy link
Collaborator

Please fix precommit. Thanks

@majiayu000
Copy link
Contributor Author

Thanks for the PR. I've formatted the code, resolved the conflicts with PR #454, and restored and adapted the tests for the unified OpenAIServingModels in pure diffusion mode. The changes have been pushed to this branch.

@Gaohan123 Gaohan123 added the ready label to trigger buildkite CI label Feb 26, 2026
@Gaohan123
Copy link
Collaborator

Please resolve CI failure

@Gaohan123
Copy link
Collaborator

Please fix DCO

@majiayu000 majiayu000 force-pushed the fix/issue-751-omni-v1-models branch from e650c47 to c4d6e9f Compare February 27, 2026 16:45
The previous commit removed the custom /v1/models handler from the omni
router but the upstream vLLM route was still being removed in init_app,
leaving no /v1/models endpoint at all.

Re-add a simplified handler that delegates to state.openai_serving_models
(either OpenAIServingModels for LLM mode or _DiffusionServingModels for
pure diffusion mode).

Signed-off-by: majiayu000 <1835304752@qq.com>
@majiayu000 majiayu000 force-pushed the fix/issue-751-omni-v1-models branch from c4d6e9f to c2bf6b5 Compare February 28, 2026 01:38
@majiayu000
Copy link
Contributor Author

Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Qwen-Image NPU online inference , Not supported /v1/models now

8 participants