[Core] Convert EngineCoreRequest to Request before reaching the engine core … #21329

Jialin · 2025-07-21T20:18:02Z

…thread

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

In engine core thread, we used 18us to convert EngineCoreRequest to Request which is on model forward critical path.

Ideally, we should be able to move the conversion from engine core thread to input request thread to relax the logic from critical path.

There's an extra benefit for making the change, as Request became available in input processing threads, which would significantly simply the pending changes for block hashing optimization mentioned in #21247

Test Plan

Profile with the change

Test Result

With the change, handle_client_request in engine core thread reduced from 35us to 7us.

As expected, right now, Request conversion is executing in paralllel with model forward

(Optional) Documentation Update

github-actions · 2025-07-21T20:18:09Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request refactors the request processing pipeline to improve performance by moving the conversion of EngineCoreRequest to Request off the main engine thread's critical path. The changes are well-structured and the performance gains are clearly demonstrated.

I've identified one potential high-severity issue: a race condition on the multi-modal input cache (mm_input_cache_server) introduced by accessing it from multiple threads without synchronization. I've provided details in a specific comment. Addressing this will ensure the thread safety of the new implementation.

vllm/v1/engine/core.py

robertgshaw2-redhat · 2025-07-21T21:40:45Z

nice idea! I wonder what else we can put into that thread...

Jialin · 2025-07-21T21:51:04Z

nice idea! I wonder what else we can put into that thread...

block hashes for sure, as we discussed in a separate PR.

Personally, the main benefit for this change is to come up with a centralized place to put all these future ideas :)

njhill · 2025-07-21T22:38:48Z

vllm/v1/request.py

Instead of adding this to the internal Request, maybe convert EngineCoreRequest to a tuple[Request, int]?

Hmmm, I might leaning toward adding it in Request instead.

current_wave is an existing field in EngineCoreRequest

If we go with tuple[Request, int], I'm afraid we might end up having tuple[Request, A, B, C, D, ...] in the future :/

WDYT?

current_wave is logically separate from the request, it's only used for coordination purposes at the point that the request is received. Request is the scheduler's state for the request so it doesn't really belong in there. So I don't think what you mentioned will be a concern.

How about wrapping current_wave in a new class (e.g. RequestEnv)? And the interface would become tuple[Request, RequestEnv]

In this way,

non-request data (i.e. current_wave) go into RequestEnv

when new similar coming in, we have a place for them without touching the interface

WDYT?

Ditto, gentle nudge @njhill for your thoughts :)

New class would be more overhead, tuples are very cheap (small allocs are reused). I don't think we have to worry about a place for other values, I don't think it's likely that there will be, and it's better to do that if/when needed in future. This isn't an external interface.

@njhill Appreciate for your inputs.

I've handed over the idea to @linzebing, and most of your comments should had been addressed. Let's move the discussion there. #21627

yeqcharlotte · 2025-07-22T04:47:56Z

vllm/v1/engine/core_client.py

where else are we calling add_request that we need to keep union of 2 types here?

At least 2 places I noticed. Especially for the later one, not fully sure if we need to further update interface upstream.

vllm/vllm/v1/engine/llm_engine.py

Line 212 in e7b2042

self.engine_core.add_request(request)

vllm/vllm/v1/engine/core_client.py

Lines 606 to 609 in e7b2042

def add_request(self, request: EngineCoreRequest) -> None:

if self.is_dp:

self.engines_running = True

self._send_input(EngineCoreRequestType.ADD, request)

An alternative approach is to

Add non-Union API as add_request(self, request: Request)

Expose EngineCoreRequest to Request conversion as API (e.g. preprocess_ad_request(EngineCoreRequest) -> Request:, and update logic on caller side

WDYT?

those should be for the sync engine. we should be able to trigger them directly using pythonic api or bench throughput. do we see similar benefits?

…thread Signed-off-by: Jialin Ouyang <[email protected]>

…ndition Signed-off-by: Jialin Ouyang <[email protected]>

Signed-off-by: Jialin Ouyang <[email protected]>

mergify · 2025-07-24T10:29:40Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Jialin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Jialin · 2025-07-25T18:52:53Z

Handed over to @linzebing in #21627

Jialin requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners July 21, 2025 20:18

mergify bot added the v1 label Jul 21, 2025

gemini-code-assist bot reviewed Jul 21, 2025

View reviewed changes

vllm/v1/engine/core.py Outdated Show resolved Hide resolved

njhill reviewed Jul 21, 2025

View reviewed changes

Jialin force-pushed the request branch from bbcce4c to 88b6724 Compare July 21, 2025 23:15

yeqcharlotte reviewed Jul 22, 2025

View reviewed changes

Jialin added 4 commits July 21, 2025 23:34

Convert EngineCoreRequest to Request before reaching the engine core …

bccab82

…thread Signed-off-by: Jialin Ouyang <[email protected]>

Put MM initialization back to EngineCore.add_request to avoid race co…

71ebc50

…ndition Signed-off-by: Jialin Ouyang <[email protected]>

Update assert comments

201ad34

Signed-off-by: Jialin Ouyang <[email protected]>

Update the EngineCore interface for backward compatibility

c3752dc

Signed-off-by: Jialin Ouyang <[email protected]>

Jialin force-pushed the request branch from 88b6724 to c3752dc Compare July 22, 2025 06:35

Jialin added a commit to Jialin/vllm that referenced this pull request Jul 22, 2025

Reimplement on top of PR vllm-project#21329

8edbf78

Signed-off-by: Jialin Ouyang <[email protected]>

mergify bot added the needs-rebase label Jul 24, 2025

linzebing mentioned this pull request Jul 25, 2025

[Core] Move EngineCoreRequest to Request conversion out of EngineCore #21627

Merged

4 tasks

Jialin closed this Jul 25, 2025

	def add_request(self, request: EngineCoreRequest) -> None:
	if self.is_dp:
	self.engines_running = True
	self._send_input(EngineCoreRequestType.ADD, request)

Uh oh!

[Core] Convert EngineCoreRequest to Request before reaching the engine core … #21329

[Core] Convert EngineCoreRequest to Request before reaching the engine core … #21329

Uh oh!

Conversation

Jialin commented Jul 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

robertgshaw2-redhat commented Jul 21, 2025

Uh oh!

Jialin commented Jul 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Jul 24, 2025

Uh oh!

Jialin commented Jul 25, 2025

Uh oh!

Uh oh!

Jialin commented Jul 21, 2025 •

edited by github-actions bot

Loading