[Frontend] Create OmniOpenAIServeImage Class and move image api to it#1383
[Frontend] Create OmniOpenAIServeImage Class and move image api to it#1383zhcn000000 wants to merge 20 commits intovllm-project:mainfrom
Conversation
…image Signed-off-by: bash000000 <m2588953@outlook.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2e7ea2734b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This pull request refactors image generation and editing functionality by creating a new OmniOpenAIServingImage class and moving image API logic into it. The goal is to align with patterns used in OmniOpenAIServingVideo and OmniOpenAIServingSpeech classes, improving code organization and maintainability.
Changes:
- Created
OmniOpenAIServingImageclass in a newserving_image.pyfile to handle image generation and editing - Added
DiffusionServingModelsclass to provide a minimal OpenAIServingModels implementation for diffusion-only servers - Introduced
ImageEditRequestandImageEditResponseprotocol models - Updated
OmniOpenAIServingVideoandOmniOpenAIServingSpeechto acceptOpenAIServingModelsparameter - Refactored API endpoints in
api_server.pyto delegate to the new handler classes - Added utility function
apply_stage_default_sampling_paramstoimage_api_utils.py
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| vllm_omni/entrypoints/openai/serving_image.py | New file containing OmniOpenAIServingImage class with image generation and editing methods |
| vllm_omni/entrypoints/openai/serving_video.py | Updated constructor to accept OpenAIServingModels, added DiffusionServingModels usage |
| vllm_omni/entrypoints/openai/serving_speech.py | Updated constructor to explicitly declare parameters |
| vllm_omni/entrypoints/openai/protocol/images.py | Added ImageEditRequest and ImageEditResponse models |
| vllm_omni/entrypoints/openai/protocol/init.py | Exported new ImageEdit models |
| vllm_omni/entrypoints/openai/image_api_utils.py | Added apply_stage_default_sampling_params utility function |
| vllm_omni/entrypoints/openai/diffusion_models.py | New file with DiffusionServingModels class |
| vllm_omni/entrypoints/openai/api_server.py | Refactored endpoints to use new serving classes, removed inline implementations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…image Signed-off-by: bash000000 <m2588953@outlook.com>
…image Signed-off-by: bash000000 <m2588953@outlook.com>
…image Signed-off-by: bash000000 <m2588953@outlook.com>
…image Signed-off-by: bash000000 <m2588953@outlook.com>
…image Signed-off-by: bash000000 <m2588953@outlook.com>
Signed-off-by: bash000000 <m2588953@outlook.com>
…image Signed-off-by: bash000000 <m2588953@outlook.com>
…image Signed-off-by: bash000000 <m2588953@outlook.com>
Signed-off-by: bash000000 <m2588953@outlook.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
hsliuustc0106
left a comment
There was a problem hiding this comment.
Review: [Frontend] Create OmniOpenAIServeImage Class
The refactoring direction is sound — extracting image-serving logic into OmniOpenAIServingImage and consolidating shared utilities into VisionMixin brings the image API in line with the video and speech patterns. However, there are several issues that need to be addressed before merging.
Critical
ImageEditRequestusesfastapi.UploadFilein a PydanticBaseModelwithoutarbitrary_types_allowed = True, which will cause a runtime Pydantic validation error.- The removal of
output_formatandsizefields fromImageGenerationResponseis a breaking API change for existing clients.
Notable
- Filename typo:
vision_utils_mexin.pyshould bevision_utils_mixin.py. generate_imagedoes not applydefault_sampling_paramsthe wayedit_imagesdoes — inconsistent behavior between the two endpoints.DiffusionServingModels.model_nameis a regular method but should be a@propertyfor consistency withVisionMixin.model_name.
|
@vllm-omni-reviewer |
🤖 VLLM-Omni PR ReviewCode Review: [Frontend] Create OmniOpenAIServeImage Class and move image api to it1. OverviewThis PR refactors the image API code by:
Overall Assessment: The refactoring direction is good and follows the existing patterns in the codebase (similar to 2. Code QualityCritical Issue - Import TypoThere's a typo in the import statement that will cause the application to fail at runtime: serving_image.py:36 and serving_video.py:23: from vllm_omni.entrypoints.openai.vision_utils_mexin import VisionMixinShould be: from vllm_omni.entrypoints.openai.vision_utils_mixin import VisionMixinMinor Issuesapi_server.py:702 - Inconsistent naming convention: def Omniimage(request: Request) -> OmniOpenAIServingImage | None:Should follow PascalCase like def OmniImage(request: Request) -> OmniOpenAIServingImage | None:serving_image.py:51-52 - The @property
def engine_client(self) -> Any:
return getattr(self, "_engine_client")Could simply be: @property
def engine_client(self) -> Any:
return self._engine_clientserving_image.py:55-56 - Same issue with @property
def model_name(self) -> str | None:
return getattr(self, "_model_name")vision_utils_mixin.py:99-100 - The if serving_models and getattr(serving_models, "base_model_paths", None):The 3. Architecture & DesignPositive Aspects
Suggestionsvision_utils_mixin.py - Consider making this an abstract base class or protocol instead of a mixin, since all methods are from typing import Protocol
class VisionProtocol(Protocol):
@property
def engine_client(self) -> Any: ...
@property
def model_name(self) -> str | None: ...serving_image.py - The 4. Security & SafetyInput Validationprotocol/images.py:131-220 - The
serving_image.py:276-281 - Good validation for max image size: if max_generated_image_size is not None and (width * height > max_generated_image_size):
raise HTTPException(...)Resource Managementserving_image.py:168-175 - The async with httpx.AsyncClient(timeout=60) as client:Potential Issue - In 5. Testing & DocumentationTest Coverage ConsiderationsThe PR description states "The logic of the function moved after the class should be exactly the same as that of the original function." However:
Documentation
6. Specific SuggestionsCritical Fix Requiredvision_utils_mixin.py - Fix the filename or the imports (file is # In serving_image.py and serving_video.py, change:
from vllm_omni.entrypoints.openai.vision_utils_mexin import VisionMixin
# To:
from vllm_omni.entrypoints.openai.vision_utils_mixin import VisionMixinCode Improvementsapi_server.py:702: # Change from:
def Omniimage(request: Request) -> OmniOpenAIServingImage | None:
# To:
def OmniImage(request: Request) -> OmniOpenAIServingImage | None:serving_image.py:51-58 - Simplify property access: @property
def engine_client(self) -> Any:
return self._engine_client
@property
def model_name(self) -> str | None:
return self._model_nameserving_image.py:336-337 - Consider using # The created timestamp is generated twice (once in the return, once for request_id)
# Consider storing it once:
created = int(time.time())
request_id = f"img_edit_{created}_{self._base_request_id(raw_request)}"protocol/images.py:136 - The image: list[UploadFile] | None = Field(default=None, description="Image file to edit")This might need custom validation since 7. Approval StatusChanges Requested - The critical import typo ( Required Changes:
Optional Improvements:
This review was generated automatically by the VLLM-Omni PR Reviewer Bot |
hsliuustc0106
left a comment
There was a problem hiding this comment.
Summary
This PR refactors image API handling by creating a dedicated OmniOpenAIServingImage class, extracting shared vision utilities into VisionMixin, and moving DiffusionServingModels to a separate module. The refactoring follows the established pattern used by OmniOpenAIServingVideo and OmniOpenAIServingSpeech.
Pros:
- Good separation of concerns - moves 500+ lines out of the monolithic api_server.py
- Consistent architecture with other serving classes
- Shared
VisionMixinreduces code duplication between image and video serving - Clean extraction of
DiffusionServingModelsto its own module - Maintains backward compatibility
Cons:
- Inconsistent naming convention (
OmniimagevsOmnispeech) - Missing test coverage for the refactored code
- Some unnecessary use of
getattrfor simple property access - Large PR with multiple concerns (refactoring + new mixin + new protocol classes)
Recommendation: Approve with minor naming fix.
| ModelCard, | ||
| ModelList, | ||
| ModelPermission, | ||
| ) |
There was a problem hiding this comment.
Good: Module extraction
Moving DiffusionServingModels to its own module improves organization. The change from _DiffusionServingModels (private) to DiffusionServingModels (public) and from _base_model_paths to base_model_paths makes it properly accessible.
lishunyang12
left a comment
There was a problem hiding this comment.
Looks like the earlier issues from my review were addressed. Left a couple more things on the latest revision.
…image
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Move the functions of edit_images and generate_image to the OmniOpenAIServingImage class,
Test Plan
The logic of the function moved after the class should be exactly the same as that of the original function.
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)