[Frontend] Create OmniOpenAIServeImage Class and move image api to it by zhcn000000 · Pull Request #1383 · vllm-project/vllm-omni

zhcn000000 · 2026-02-15T08:37:31Z

…image

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Move the functions of edit_images and generate_image to the OmniOpenAIServingImage class,

This will keeping it consistent with other processing classes for interfaces like OmniOpenAIServingVideo and OmniOpenAIServingSpeech.
This will facilitate the addition of the /v1/images/variations interface in the future,
The addition of the for_diffusion function to instantiate the diffusion model,
The relocation of related interfaces to openai/images/service, openai/images/protocol, and openai/images/api_server.

Test Plan

The logic of the function moved after the class should be exactly the same as that of the original function.

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please providing the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please pasting the results comparison before and after, or e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

…image Signed-off-by: bash000000 <m2588953@outlook.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2e7ea2734b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/entrypoints/openai/serving_image.py

vllm_omni/entrypoints/openai/serving_video.py

vllm_omni/entrypoints/openai/api_server.py

Copilot

Pull request overview

This pull request refactors image generation and editing functionality by creating a new OmniOpenAIServingImage class and moving image API logic into it. The goal is to align with patterns used in OmniOpenAIServingVideo and OmniOpenAIServingSpeech classes, improving code organization and maintainability.

Changes:

Created OmniOpenAIServingImage class in a new serving_image.py file to handle image generation and editing
Added DiffusionServingModels class to provide a minimal OpenAIServingModels implementation for diffusion-only servers
Introduced ImageEditRequest and ImageEditResponse protocol models
Updated OmniOpenAIServingVideo and OmniOpenAIServingSpeech to accept OpenAIServingModels parameter
Refactored API endpoints in api_server.py to delegate to the new handler classes
Added utility function apply_stage_default_sampling_params to image_api_utils.py

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
vllm_omni/entrypoints/openai/serving_image.py	New file containing OmniOpenAIServingImage class with image generation and editing methods
vllm_omni/entrypoints/openai/serving_video.py	Updated constructor to accept OpenAIServingModels, added DiffusionServingModels usage
vllm_omni/entrypoints/openai/serving_speech.py	Updated constructor to explicitly declare parameters
vllm_omni/entrypoints/openai/protocol/images.py	Added ImageEditRequest and ImageEditResponse models
vllm_omni/entrypoints/openai/protocol/init.py	Exported new ImageEdit models
vllm_omni/entrypoints/openai/image_api_utils.py	Added apply_stage_default_sampling_params utility function
vllm_omni/entrypoints/openai/diffusion_models.py	New file with DiffusionServingModels class
vllm_omni/entrypoints/openai/api_server.py	Refactored endpoints to use new serving classes, removed inline implementations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm_omni/entrypoints/openai/protocol/images.py

vllm_omni/entrypoints/openai/diffusion_models.py

vllm_omni/entrypoints/openai/serving_image.py

vllm_omni/entrypoints/openai/diffusion_models.py

vllm_omni/entrypoints/openai/serving_image.py

vllm_omni/entrypoints/openai/api_server.py

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm_omni/entrypoints/openai/serving_image.py

vllm_omni/entrypoints/openai/protocol/images.py

vllm_omni/entrypoints/openai/serving_image.py

…image Signed-off-by: bash000000 <m2588953@outlook.com>

Signed-off-by: bash000000 <m2588953@outlook.com>

…image Signed-off-by: bash000000 <m2588953@outlook.com>

lishunyang12

A few things that would break in production — see inline.

vllm_omni/entrypoints/openai/serving_image.py

vllm_omni/entrypoints/openai/protocol/images.py

Signed-off-by: bash000000 <m2588953@outlook.com>

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm_omni/entrypoints/openai/serving_image.py

vllm_omni/entrypoints/openai/protocol/images.py

vllm_omni/entrypoints/openai/serving_image.py

vllm_omni/entrypoints/openai/diffusion_models.py

Signed-off-by: bash000000 <m2588953@outlook.com>

vllm_omni/entrypoints/openai/vision_utils_mixin.py

hsliuustc0106

Review: [Frontend] Create OmniOpenAIServeImage Class

The refactoring direction is sound — extracting image-serving logic into OmniOpenAIServingImage and consolidating shared utilities into VisionMixin brings the image API in line with the video and speech patterns. However, there are several issues that need to be addressed before merging.

Critical

ImageEditRequest uses fastapi.UploadFile in a Pydantic BaseModel without arbitrary_types_allowed = True, which will cause a runtime Pydantic validation error.
The removal of output_format and size fields from ImageGenerationResponse is a breaking API change for existing clients.

Notable

Filename typo: vision_utils_mexin.py should be vision_utils_mixin.py.
generate_image does not apply default_sampling_params the way edit_images does — inconsistent behavior between the two endpoints.
DiffusionServingModels.model_name is a regular method but should be a @property for consistency with VisionMixin.model_name.

vllm_omni/entrypoints/openai/protocol/images.py

vllm_omni/entrypoints/openai/vision_utils_mixin.py

vllm_omni/entrypoints/openai/serving_image.py

vllm_omni/entrypoints/openai/diffusion_models.py

vllm_omni/entrypoints/openai/serving_image.py

vllm_omni/entrypoints/openai/api_server.py

vllm_omni/entrypoints/openai/vision_utils_mixin.py

Signed-off-by: bash <m2588953@outlook.com>

hsliuustc0106 · 2026-02-24T07:07:16Z

@vllm-omni-reviewer

github-actions · 2026-02-24T07:18:20Z

🤖 VLLM-Omni PR Review

Code Review: [Frontend] Create OmniOpenAIServeImage Class and move image api to it

1. Overview

This PR refactors the image API code by:

Creating a new OmniOpenAIServingImage class to handle image generation and editing
Extracting common functionality into a VisionMixin class shared with video serving
Creating a DiffusionServingModels class for diffusion-only servers
Adding new protocol classes (ImageEditRequest, ImageEditResponse)

Overall Assessment: The refactoring direction is good and follows the existing patterns in the codebase (similar to OmniOpenAIServingVideo and OmniOpenAIServingSpeech). However, there's a critical typo that will cause import errors.

2. Code Quality

Critical Issue - Import Typo

There's a typo in the import statement that will cause the application to fail at runtime:

serving_image.py:36 and serving_video.py:23:

from vllm_omni.entrypoints.openai.vision_utils_mexin import VisionMixin

Should be:

from vllm_omni.entrypoints.openai.vision_utils_mixin import VisionMixin

Minor Issues

api_server.py:702 - Inconsistent naming convention:

def Omniimage(request: Request) -> OmniOpenAIServingImage | None:

Should follow PascalCase like Omnispeech:

def OmniImage(request: Request) -> OmniOpenAIServingImage | None:

serving_image.py:51-52 - The engine_client property uses getattr unnecessarily:

@property
def engine_client(self) -> Any:
    return getattr(self, "_engine_client")

Could simply be:

@property
def engine_client(self) -> Any:
    return self._engine_client

serving_image.py:55-56 - Same issue with model_name:

@property
def model_name(self) -> str | None:
    return getattr(self, "_model_name")

vision_utils_mixin.py:99-100 - The _resolve_model_name method could be simplified:

if serving_models and getattr(serving_models, "base_model_paths", None):

The getattr with default None is good, but the nested check could be cleaner.

3. Architecture & Design

Positive Aspects

Good separation of concerns - moving image logic out of the monolithic api_server.py
Consistent with existing patterns (OmniOpenAIServingVideo, OmniOpenAIServingSpeech)
The VisionMixin provides good code reuse for shared functionality
The for_diffusion class method pattern is consistent with other serving classes

Suggestions

vision_utils_mixin.py - Consider making this an abstract base class or protocol instead of a mixin, since all methods are @staticmethod except for the properties:

from typing import Protocol

class VisionProtocol(Protocol):
    @property
    def engine_client(self) -> Any: ...
    @property
    def model_name(self) -> str | None: ...

serving_image.py - The _generate_with_async_omni method is quite long. Consider extracting the sampling params list construction into a separate helper method.

4. Security & Safety

Input Validation

protocol/images.py:131-220 - The ImageEditRequest model has good validation with Field constraints:

n: ge=1, le=10
num_inference_steps: ge=1, le=200
guidance_scale: ge=0.0, le=20.0

serving_image.py:276-281 - Good validation for max image size:

if max_generated_image_size is not None and (width * height > max_generated_image_size):
    raise HTTPException(...)

Resource Management

serving_image.py:168-175 - The httpx.AsyncClient usage is good with explicit timeout:

async with httpx.AsyncClient(timeout=60) as client:

Potential Issue - In _load_input_images, large images could cause memory issues. Consider adding size limits for uploaded images.

5. Testing & Documentation

Test Coverage Considerations

The PR description states "The logic of the function moved after the class should be exactly the same as that of the original function." However:

No test scripts are provided
Consider adding unit tests for the new VisionMixin methods
Integration tests for the image endpoints should verify the refactoring didn't break functionality

Documentation

The docstrings are good but could be more comprehensive
ImageEditRequest fields have good descriptions

6. Specific Suggestions

Critical Fix Required

vision_utils_mixin.py - Fix the filename or the imports (file is vision_utils_mixin.py but imports use vision_utils_mexin):

# In serving_image.py and serving_video.py, change:
from vllm_omni.entrypoints.openai.vision_utils_mexin import VisionMixin
# To:
from vllm_omni.entrypoints.openai.vision_utils_mixin import VisionMixin

Code Improvements

api_server.py:702:

# Change from:
def Omniimage(request: Request) -> OmniOpenAIServingImage | None:
# To:
def OmniImage(request: Request) -> OmniOpenAIServingImage | None:

serving_image.py:51-58 - Simplify property access:

@property
def engine_client(self) -> Any:
    return self._engine_client

@property
def model_name(self) -> str | None:
    return self._model_name

serving_image.py:336-337 - Consider using time.time() consistently:

# The created timestamp is generated twice (once in the return, once for request_id)
# Consider storing it once:
created = int(time.time())
request_id = f"img_edit_{created}_{self._base_request_id(raw_request)}"

protocol/images.py:136 - The ImageEditRequest model uses UploadFile which is a FastAPI type. Consider if this works correctly with Pydantic validation:

image: list[UploadFile] | None = Field(default=None, description="Image file to edit")

This might need custom validation since UploadFile is not a standard Pydantic type.

7. Approval Status

Changes Requested - The critical import typo (vision_utils_mexin vs vision_utils_mixin) will cause runtime failures. Once this is fixed, the PR should be ready for merge.

Required Changes:

Fix the import typo in serving_image.py and serving_video.py
Consider the naming convention fix for OmniImage

Optional Improvements:

Simplify property accessors
Add unit tests for the refactored code
Consider the VisionProtocol approach instead of mixin

This review was generated automatically by the VLLM-Omni PR Reviewer Bot
using glm-5.

Signed-off-by: bash000000 <m2588953@outlook.com>

hsliuustc0106

Summary

This PR refactors image API handling by creating a dedicated OmniOpenAIServingImage class, extracting shared vision utilities into VisionMixin, and moving DiffusionServingModels to a separate module. The refactoring follows the established pattern used by OmniOpenAIServingVideo and OmniOpenAIServingSpeech.

Pros:

Good separation of concerns - moves 500+ lines out of the monolithic api_server.py
Consistent architecture with other serving classes
Shared VisionMixin reduces code duplication between image and video serving
Clean extraction of DiffusionServingModels to its own module
Maintains backward compatibility

Cons:

Inconsistent naming convention (Omniimage vs Omnispeech)
Missing test coverage for the refactored code
Some unnecessary use of getattr for simple property access
Large PR with multiple concerns (refactoring + new mixin + new protocol classes)

Recommendation: Approve with minor naming fix.

vllm_omni/entrypoints/openai/api_server.py

vllm_omni/entrypoints/openai/serving_image.py

vllm_omni/entrypoints/openai/vision_utils_mixin.py

hsliuustc0106 · 2026-02-27T02:12:30Z

vllm_omni/entrypoints/openai/diffusion_models.py

+    ModelCard,
+    ModelList,
+    ModelPermission,
+)


Good: Module extraction

Moving DiffusionServingModels to its own module improves organization. The change from _DiffusionServingModels (private) to DiffusionServingModels (public) and from _base_model_paths to base_model_paths makes it properly accessible.

vllm_omni/entrypoints/openai/protocol/images.py

vllm_omni/entrypoints/openai/serving_video.py

lishunyang12

Looks like the earlier issues from my review were addressed. Left a couple more things on the latest revision.

vllm_omni/entrypoints/openai/serving_image.py

vllm_omni/entrypoints/openai/api_server.py

vllm_omni/entrypoints/openai/vision_utils_mixin.py

Signed-off-by: bash000000 <m2588953@outlook.com>

zhcn000000 requested a review from hsliuustc0106 as a code owner February 15, 2026 08:37

Copilot AI review requested due to automatic review settings February 15, 2026 08:37

Copilot started reviewing on behalf of zhcn000000 February 15, 2026 08:38 View session

Create OmniOpenAIServeImage Class To manage edit_images and generate_…

23bbe8b

…image Signed-off-by: bash000000 <m2588953@outlook.com>

zhcn000000 force-pushed the main branch from 2e7ea27 to 23bbe8b Compare February 15, 2026 08:38

chatgpt-codex-connector bot reviewed Feb 15, 2026

View reviewed changes

vllm_omni/entrypoints/openai/serving_image.py Show resolved Hide resolved

vllm_omni/entrypoints/openai/serving_video.py Outdated Show resolved Hide resolved

vllm_omni/entrypoints/openai/api_server.py Outdated Show resolved Hide resolved

Copilot AI reviewed Feb 15, 2026

View reviewed changes

hsliuustc0106 requested a review from Copilot February 15, 2026 10:29

Copilot AI reviewed Feb 15, 2026

View reviewed changes

vllm_omni/entrypoints/openai/serving_image.py Outdated Show resolved Hide resolved

vllm_omni/entrypoints/openai/protocol/images.py Outdated Show resolved Hide resolved

vllm_omni/entrypoints/openai/serving_image.py Outdated Show resolved Hide resolved

Copilot started reviewing on behalf of hsliuustc0106 February 15, 2026 10:39 View session

zhcn000000 added 7 commits February 15, 2026 19:46

Create OmniOpenAIServeImage Class To manage edit_images and generate_…

ec489f4

…image Signed-off-by: bash000000 <m2588953@outlook.com>

Create OmniOpenAIServeImage Class To manage edit_images and generate_…

fb4aa77

…image Signed-off-by: bash000000 <m2588953@outlook.com>

Create OmniOpenAIServeImage Class To manage edit_images and generate_…

aac9c6f

…image Signed-off-by: bash000000 <m2588953@outlook.com>

Create OmniOpenAIServeImage Class To manage edit_images and generate_…

8266a29

…image Signed-off-by: bash000000 <m2588953@outlook.com>

Create OmniOpenAIServeImage Class To manage edit_images and generate_…

de9696f

…image Signed-off-by: bash000000 <m2588953@outlook.com>

Update serving_speech.py

88c7376

Signed-off-by: bash000000 <m2588953@outlook.com>

Create OmniOpenAIServeImage Class To manage edit_images and generate_…

7844f61

…image Signed-off-by: bash000000 <m2588953@outlook.com>

zhcn000000 force-pushed the main branch from 530ba45 to 7844f61 Compare February 15, 2026 11:46

Create OmniOpenAIServeImage Class To manage edit_images and generate_…

6db2ca5

…image Signed-off-by: bash000000 <m2588953@outlook.com>

lishunyang12 suggested changes Feb 21, 2026

View reviewed changes

Merge remote-tracking branch 'offical/main'

387194c

Signed-off-by: bash000000 <m2588953@outlook.com>

tzhouam requested review from Copilot and tzhouam February 23, 2026 14:10

Copilot started reviewing on behalf of tzhouam February 23, 2026 14:10 View session

Copilot AI reviewed Feb 23, 2026

View reviewed changes

zhcn000000 added 3 commits February 23, 2026 22:55

Fix bugs

0bd3913

Signed-off-by: bash000000 <m2588953@outlook.com>

Fix bugs

808255f

Signed-off-by: bash000000 <m2588953@outlook.com>

Fix bugs

df38906

Signed-off-by: bash000000 <m2588953@outlook.com>

hsliuustc0106 reviewed Feb 23, 2026

View reviewed changes

vllm_omni/entrypoints/openai/vision_utils_mixin.py Show resolved Hide resolved

hsliuustc0106 requested changes Feb 23, 2026

View reviewed changes

zhcn000000 added 2 commits February 23, 2026 23:49

fix bugs

b758d7a

Signed-off-by: bash <m2588953@outlook.com>

fix bugs

01d91f3

Signed-off-by: bash <m2588953@outlook.com>

zhcn000000 added 3 commits February 24, 2026 17:15

Fix bugs

ee3c94b

Signed-off-by: bash000000 <m2588953@outlook.com>

Fix bugs

02d6caf

Signed-off-by: bash000000 <m2588953@outlook.com>

Fix bugs

9cee918

Signed-off-by: bash000000 <m2588953@outlook.com>

zhcn000000 force-pushed the main branch from bdf1bad to 9cee918 Compare February 24, 2026 09:46

hsliuustc0106 reviewed Feb 27, 2026

View reviewed changes

lishunyang12 reviewed Feb 27, 2026

View reviewed changes

vllm_omni/entrypoints/openai/serving_image.py Show resolved Hide resolved

vllm_omni/entrypoints/openai/api_server.py Outdated Show resolved Hide resolved

vllm_omni/entrypoints/openai/vision_utils_mixin.py Outdated Show resolved Hide resolved

zhcn000000 added 2 commits March 1, 2026 10:29

Fix bugs

2ad78fd

Signed-off-by: bash000000 <m2588953@outlook.com>

Merge remote-tracking branch 'offical/main'

feebd0c

Conversation

zhcn000000 commented Feb 15, 2026

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lishunyang12 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Review: [Frontend] Create OmniOpenAIServeImage Class

Critical

Notable

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 commented Feb 24, 2026

Uh oh!

github-actions bot commented Feb 24, 2026

🤖 VLLM-Omni PR Review

Code Review: [Frontend] Create OmniOpenAIServeImage Class and move image api to it

1. Overview

2. Code Quality

Critical Issue - Import Typo

lishunyang12 left a comment •

edited

Loading