Skip to content

[Frontend] Create OmniOpenAIServeImage Class and move image api to it#1383

Open
zhcn000000 wants to merge 20 commits intovllm-project:mainfrom
zhcn000000:main
Open

[Frontend] Create OmniOpenAIServeImage Class and move image api to it#1383
zhcn000000 wants to merge 20 commits intovllm-project:mainfrom
zhcn000000:main

Conversation

@zhcn000000
Copy link

…image

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Move the functions of edit_images and generate_image to the OmniOpenAIServingImage class,

  • This will keeping it consistent with other processing classes for interfaces like OmniOpenAIServingVideo and OmniOpenAIServingSpeech.
  • This will facilitate the addition of the /v1/images/variations interface in the future,
  • The addition of the for_diffusion function to instantiate the diffusion model,
  • The relocation of related interfaces to openai/images/service, openai/images/protocol, and openai/images/api_server.

Test Plan

The logic of the function moved after the class should be exactly the same as that of the original function.

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please providing the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please pasting the results comparison before and after, or e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copilot AI review requested due to automatic review settings February 15, 2026 08:37
…image

Signed-off-by: bash000000 <m2588953@outlook.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2e7ea2734b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors image generation and editing functionality by creating a new OmniOpenAIServingImage class and moving image API logic into it. The goal is to align with patterns used in OmniOpenAIServingVideo and OmniOpenAIServingSpeech classes, improving code organization and maintainability.

Changes:

  • Created OmniOpenAIServingImage class in a new serving_image.py file to handle image generation and editing
  • Added DiffusionServingModels class to provide a minimal OpenAIServingModels implementation for diffusion-only servers
  • Introduced ImageEditRequest and ImageEditResponse protocol models
  • Updated OmniOpenAIServingVideo and OmniOpenAIServingSpeech to accept OpenAIServingModels parameter
  • Refactored API endpoints in api_server.py to delegate to the new handler classes
  • Added utility function apply_stage_default_sampling_params to image_api_utils.py

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
vllm_omni/entrypoints/openai/serving_image.py New file containing OmniOpenAIServingImage class with image generation and editing methods
vllm_omni/entrypoints/openai/serving_video.py Updated constructor to accept OpenAIServingModels, added DiffusionServingModels usage
vllm_omni/entrypoints/openai/serving_speech.py Updated constructor to explicitly declare parameters
vllm_omni/entrypoints/openai/protocol/images.py Added ImageEditRequest and ImageEditResponse models
vllm_omni/entrypoints/openai/protocol/init.py Exported new ImageEdit models
vllm_omni/entrypoints/openai/image_api_utils.py Added apply_stage_default_sampling_params utility function
vllm_omni/entrypoints/openai/diffusion_models.py New file with DiffusionServingModels class
vllm_omni/entrypoints/openai/api_server.py Refactored endpoints to use new serving classes, removed inline implementations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…image

Signed-off-by: bash000000 <m2588953@outlook.com>
…image

Signed-off-by: bash000000 <m2588953@outlook.com>
…image

Signed-off-by: bash000000 <m2588953@outlook.com>
…image

Signed-off-by: bash000000 <m2588953@outlook.com>
…image

Signed-off-by: bash000000 <m2588953@outlook.com>
Signed-off-by: bash000000 <m2588953@outlook.com>
…image

Signed-off-by: bash000000 <m2588953@outlook.com>
…image

Signed-off-by: bash000000 <m2588953@outlook.com>
Copy link
Contributor

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things that would break in production — see inline.

Signed-off-by: bash000000 <m2588953@outlook.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: bash000000 <m2588953@outlook.com>
Signed-off-by: bash000000 <m2588953@outlook.com>
Signed-off-by: bash000000 <m2588953@outlook.com>
Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: [Frontend] Create OmniOpenAIServeImage Class

The refactoring direction is sound — extracting image-serving logic into OmniOpenAIServingImage and consolidating shared utilities into VisionMixin brings the image API in line with the video and speech patterns. However, there are several issues that need to be addressed before merging.

Critical

  • ImageEditRequest uses fastapi.UploadFile in a Pydantic BaseModel without arbitrary_types_allowed = True, which will cause a runtime Pydantic validation error.
  • The removal of output_format and size fields from ImageGenerationResponse is a breaking API change for existing clients.

Notable

  • Filename typo: vision_utils_mexin.py should be vision_utils_mixin.py.
  • generate_image does not apply default_sampling_params the way edit_images does — inconsistent behavior between the two endpoints.
  • DiffusionServingModels.model_name is a regular method but should be a @property for consistency with VisionMixin.model_name.

Signed-off-by: bash <m2588953@outlook.com>
Signed-off-by: bash <m2588953@outlook.com>
@hsliuustc0106
Copy link
Collaborator

@vllm-omni-reviewer

@github-actions
Copy link

🤖 VLLM-Omni PR Review

Code Review: [Frontend] Create OmniOpenAIServeImage Class and move image api to it

1. Overview

This PR refactors the image API code by:

  1. Creating a new OmniOpenAIServingImage class to handle image generation and editing
  2. Extracting common functionality into a VisionMixin class shared with video serving
  3. Creating a DiffusionServingModels class for diffusion-only servers
  4. Adding new protocol classes (ImageEditRequest, ImageEditResponse)

Overall Assessment: The refactoring direction is good and follows the existing patterns in the codebase (similar to OmniOpenAIServingVideo and OmniOpenAIServingSpeech). However, there's a critical typo that will cause import errors.

2. Code Quality

Critical Issue - Import Typo

There's a typo in the import statement that will cause the application to fail at runtime:

serving_image.py:36 and serving_video.py:23:

from vllm_omni.entrypoints.openai.vision_utils_mexin import VisionMixin

Should be:

from vllm_omni.entrypoints.openai.vision_utils_mixin import VisionMixin

Minor Issues

api_server.py:702 - Inconsistent naming convention:

def Omniimage(request: Request) -> OmniOpenAIServingImage | None:

Should follow PascalCase like Omnispeech:

def OmniImage(request: Request) -> OmniOpenAIServingImage | None:

serving_image.py:51-52 - The engine_client property uses getattr unnecessarily:

@property
def engine_client(self) -> Any:
    return getattr(self, "_engine_client")

Could simply be:

@property
def engine_client(self) -> Any:
    return self._engine_client

serving_image.py:55-56 - Same issue with model_name:

@property
def model_name(self) -> str | None:
    return getattr(self, "_model_name")

vision_utils_mixin.py:99-100 - The _resolve_model_name method could be simplified:

if serving_models and getattr(serving_models, "base_model_paths", None):

The getattr with default None is good, but the nested check could be cleaner.

3. Architecture & Design

Positive Aspects

  • Good separation of concerns - moving image logic out of the monolithic api_server.py
  • Consistent with existing patterns (OmniOpenAIServingVideo, OmniOpenAIServingSpeech)
  • The VisionMixin provides good code reuse for shared functionality
  • The for_diffusion class method pattern is consistent with other serving classes

Suggestions

vision_utils_mixin.py - Consider making this an abstract base class or protocol instead of a mixin, since all methods are @staticmethod except for the properties:

from typing import Protocol

class VisionProtocol(Protocol):
    @property
    def engine_client(self) -> Any: ...
    @property
    def model_name(self) -> str | None: ...

serving_image.py - The _generate_with_async_omni method is quite long. Consider extracting the sampling params list construction into a separate helper method.

4. Security & Safety

Input Validation

protocol/images.py:131-220 - The ImageEditRequest model has good validation with Field constraints:

  • n: ge=1, le=10
  • num_inference_steps: ge=1, le=200
  • guidance_scale: ge=0.0, le=20.0

serving_image.py:276-281 - Good validation for max image size:

if max_generated_image_size is not None and (width * height > max_generated_image_size):
    raise HTTPException(...)

Resource Management

serving_image.py:168-175 - The httpx.AsyncClient usage is good with explicit timeout:

async with httpx.AsyncClient(timeout=60) as client:

Potential Issue - In _load_input_images, large images could cause memory issues. Consider adding size limits for uploaded images.

5. Testing & Documentation

Test Coverage Considerations

The PR description states "The logic of the function moved after the class should be exactly the same as that of the original function." However:

  • No test scripts are provided
  • Consider adding unit tests for the new VisionMixin methods
  • Integration tests for the image endpoints should verify the refactoring didn't break functionality

Documentation

  • The docstrings are good but could be more comprehensive
  • ImageEditRequest fields have good descriptions

6. Specific Suggestions

Critical Fix Required

vision_utils_mixin.py - Fix the filename or the imports (file is vision_utils_mixin.py but imports use vision_utils_mexin):

# In serving_image.py and serving_video.py, change:
from vllm_omni.entrypoints.openai.vision_utils_mexin import VisionMixin
# To:
from vllm_omni.entrypoints.openai.vision_utils_mixin import VisionMixin

Code Improvements

api_server.py:702:

# Change from:
def Omniimage(request: Request) -> OmniOpenAIServingImage | None:
# To:
def OmniImage(request: Request) -> OmniOpenAIServingImage | None:

serving_image.py:51-58 - Simplify property access:

@property
def engine_client(self) -> Any:
    return self._engine_client

@property
def model_name(self) -> str | None:
    return self._model_name

serving_image.py:336-337 - Consider using time.time() consistently:

# The created timestamp is generated twice (once in the return, once for request_id)
# Consider storing it once:
created = int(time.time())
request_id = f"img_edit_{created}_{self._base_request_id(raw_request)}"

protocol/images.py:136 - The ImageEditRequest model uses UploadFile which is a FastAPI type. Consider if this works correctly with Pydantic validation:

image: list[UploadFile] | None = Field(default=None, description="Image file to edit")

This might need custom validation since UploadFile is not a standard Pydantic type.

7. Approval Status

Changes Requested - The critical import typo (vision_utils_mexin vs vision_utils_mixin) will cause runtime failures. Once this is fixed, the PR should be ready for merge.

Required Changes:

  1. Fix the import typo in serving_image.py and serving_video.py
  2. Consider the naming convention fix for OmniImage

Optional Improvements:

  1. Simplify property accessors
  2. Add unit tests for the refactored code
  3. Consider the VisionProtocol approach instead of mixin

This review was generated automatically by the VLLM-Omni PR Reviewer Bot
using glm-5.

Signed-off-by: bash000000 <m2588953@outlook.com>
Signed-off-by: bash000000 <m2588953@outlook.com>
Signed-off-by: bash000000 <m2588953@outlook.com>
Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR refactors image API handling by creating a dedicated OmniOpenAIServingImage class, extracting shared vision utilities into VisionMixin, and moving DiffusionServingModels to a separate module. The refactoring follows the established pattern used by OmniOpenAIServingVideo and OmniOpenAIServingSpeech.

Pros:

  • Good separation of concerns - moves 500+ lines out of the monolithic api_server.py
  • Consistent architecture with other serving classes
  • Shared VisionMixin reduces code duplication between image and video serving
  • Clean extraction of DiffusionServingModels to its own module
  • Maintains backward compatibility

Cons:

  • Inconsistent naming convention (Omniimage vs Omnispeech)
  • Missing test coverage for the refactored code
  • Some unnecessary use of getattr for simple property access
  • Large PR with multiple concerns (refactoring + new mixin + new protocol classes)

Recommendation: Approve with minor naming fix.

ModelCard,
ModelList,
ModelPermission,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good: Module extraction

Moving DiffusionServingModels to its own module improves organization. The change from _DiffusionServingModels (private) to DiffusionServingModels (public) and from _base_model_paths to base_model_paths makes it properly accessible.

Copy link
Contributor

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the earlier issues from my review were addressed. Left a couple more things on the latest revision.

Signed-off-by: bash000000 <m2588953@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants