Fix: Enable /v1/models endpoint for pure diffusion mode#805
Fix: Enable /v1/models endpoint for pure diffusion mode#805majiayu000 wants to merge 4 commits intovllm-project:mainfrom
Conversation
09b8c80 to
8a9ea75
Compare
gcanlin
left a comment
There was a problem hiding this comment.
I think it's better to move the original OpenAIServingModels up.
vllm-omni/vllm_omni/entrypoints/openai/api_server.py
Lines 418 to 422 in 8a9ea75
|
Please add test plan and results. Thanks |
8a9ea75 to
9dfea2d
Compare
|
Updated the PR to address reviewer feedback:
Changes Made
The implementation properly fixes issue #751 by ensuring the |
|
@fake0fan PTAL |
There was a problem hiding this comment.
Pull request overview
This PR fixes issue #751 by enabling the /v1/models endpoint for pure diffusion mode. Previously, OpenAIServingModels was only initialized in LLM mode, causing the /v1/models endpoint to be unavailable when running pure diffusion models like Qwen-Image.
Changes:
- Initialize OpenAIServingModels before the pure diffusion mode check to ensure it's available for both diffusion and LLM modes
- Add state.args assignment for potential future use
- Refactor lora_modules handling to support early initialization
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Initialize OpenAIServingModels (shared by both diffusion and LLM modes) | ||
| state.openai_serving_models = OpenAIServingModels( | ||
| engine_client=engine_client, | ||
| base_model_paths=base_model_paths, | ||
| lora_modules=lora_modules, | ||
| ) |
There was a problem hiding this comment.
In LLM mode with default_mm_loras, OpenAIServingModels is initialized twice - once at lines 286-290 and again at lines 374-378. This creates an unnecessary object that will be immediately discarded. Consider initializing OpenAIServingModels only once after determining the final lora_modules value. For example, move the first initialization to after the lora_modules processing logic (after line 378) and use conditional logic to determine whether to use args.lora_modules or the merged lora_modules.
| # Initialize OpenAIServingModels (shared by both diffusion and LLM modes) | |
| state.openai_serving_models = OpenAIServingModels( | |
| engine_client=engine_client, | |
| base_model_paths=base_model_paths, | |
| lora_modules=lora_modules, | |
| ) | |
| # Initialize OpenAIServingModels early only for pure diffusion mode. | |
| # In LLM/multi-stage modes, OpenAIServingModels will be initialized later | |
| # after any lora_modules processing/merging is complete. | |
| if is_pure_diffusion: | |
| state.openai_serving_models = OpenAIServingModels( | |
| engine_client=engine_client, | |
| base_model_paths=base_model_paths, | |
| lora_modules=lora_modules, | |
| ) |
| base_model_paths = [BaseModelPath(name=name, model_path=args.model) for name in served_model_names] | ||
| state.engine_client = engine_client | ||
| state.log_stats = not args.disable_log_stats | ||
| state.args = args |
There was a problem hiding this comment.
The assignment state.args = args appears to be unused. There are no other references to state.args in the codebase. If this is intended for future use or debugging purposes, consider adding a comment explaining why it's being stored. Otherwise, this line can be removed.
| state.args = args |
| # Process lora_modules early for OpenAIServingModels initialization | ||
| # In pure diffusion mode, vllm_config will be None, so we use args.lora_modules directly | ||
| lora_modules = args.lora_modules | ||
|
|
||
| # Initialize OpenAIServingModels (shared by both diffusion and LLM modes) | ||
| state.openai_serving_models = OpenAIServingModels( | ||
| engine_client=engine_client, | ||
| base_model_paths=base_model_paths, | ||
| lora_modules=lora_modules, |
There was a problem hiding this comment.
The variable lora_modules is assigned to args.lora_modules at line 283, but in LLM mode it's reassigned to the same value at line 359. The assignment at line 283 is used for pure diffusion mode (line 289) and the assignment at line 359 is used for LLM mode. Consider refactoring to avoid this redundancy - for example, by only setting lora_modules once before the diffusion mode check, or by using a different variable name for the LLM-specific processing.
| # Process lora_modules early for OpenAIServingModels initialization | |
| # In pure diffusion mode, vllm_config will be None, so we use args.lora_modules directly | |
| lora_modules = args.lora_modules | |
| # Initialize OpenAIServingModels (shared by both diffusion and LLM modes) | |
| state.openai_serving_models = OpenAIServingModels( | |
| engine_client=engine_client, | |
| base_model_paths=base_model_paths, | |
| lora_modules=lora_modules, | |
| # Initialize OpenAIServingModels (shared by both diffusion and LLM modes) | |
| # In pure diffusion mode, vllm_config will be None, so we pass args.lora_modules directly | |
| state.openai_serving_models = OpenAIServingModels( | |
| engine_client=engine_client, | |
| base_model_paths=base_model_paths, | |
| lora_modules=args.lora_modules, |
| engine_client=engine_client, | ||
| base_model_paths=base_model_paths, | ||
| lora_modules=lora_modules, | ||
| ) |
There was a problem hiding this comment.
I don't understand why we need to reinitialize OpenAIServingModels with merged lora_modules? Can we solve this problem by moving lora_modules up as well?
There was a problem hiding this comment.
To clarify, this is not a re-initialization. Since we use omni_run_server_worker as the entry point, the upstream vLLM init_app_state is never called.
|
@majiayu000 Hey, this fixes #751 and it's only 20 lines — initializing OpenAIServingModels in pure diffusion mode so /v1/models works. Seems like a straightforward fix. Is there anything blocking this? |
9dfea2d to
3aab7be
Compare
|
Please fix precommit. Thanks |
|
Thanks for the PR. I've formatted the code, resolved the conflicts with PR #454, and restored and adapted the tests for the unified |
|
Please resolve CI failure |
|
Please fix DCO |
…roject#751) Signed-off-by: majiayu000 <1835304752@qq.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
e650c47 to
c4d6e9f
Compare
The previous commit removed the custom /v1/models handler from the omni router but the upstream vLLM route was still being removed in init_app, leaving no /v1/models endpoint at all. Re-add a simplified handler that delegates to state.openai_serving_models (either OpenAIServingModels for LLM mode or _DiffusionServingModels for pure diffusion mode). Signed-off-by: majiayu000 <1835304752@qq.com>
c4d6e9f to
c2bf6b5
Compare
|
Fixed |
Fixes #751. Initializes OpenAIServingModels in pure diffusion mode to ensure the /v1/models endpoint is correctly populated.