-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Core] Convert EngineCoreRequest to Request before reaching the engine core … #21329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
bccab82
71ebc50
201ad34
c3752dc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,10 +35,12 @@ def __init__( | |
lora_request: Optional["LoRARequest"] = None, | ||
structured_output_request: Optional["StructuredOutputRequest"] = None, | ||
cache_salt: Optional[str] = None, | ||
current_wave: int = 0, | ||
|
||
priority: int = 0, | ||
) -> None: | ||
self.request_id = request_id | ||
self.client_index = client_index | ||
self.current_wave = current_wave | ||
self.priority = priority | ||
self.sampling_params = sampling_params | ||
self.pooling_params = pooling_params | ||
|
@@ -131,6 +133,7 @@ def from_engine_core_request(cls, request: EngineCoreRequest) -> "Request": | |
sampling_params=request.sampling_params) \ | ||
if request.sampling_params else None, | ||
cache_salt=request.cache_salt, | ||
current_wave=request.current_wave, | ||
priority=request.priority, | ||
) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where else are we calling
add_request
that we need to keep union of 2 types here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least 2 places I noticed. Especially for the later one, not fully sure if we need to further update interface upstream.
vllm/vllm/v1/engine/llm_engine.py
Line 212 in e7b2042
vllm/vllm/v1/engine/core_client.py
Lines 606 to 609 in e7b2042
An alternative approach is to
add_request(self, request: Request)
preprocess_ad_request(EngineCoreRequest) -> Request:
, and update logic on caller sideWDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those should be for the sync engine. we should be able to trigger them directly using pythonic api or bench throughput. do we see similar benefits?