Fix race condition in LoRA streaming requests by Lokiiiiii · Pull Request #2954 · deepjavalibrary/djl-serving

Lokiiiiii · 2025-11-19T02:34:20Z

The Bug

When using vLLM async service with LoRA adapters, streaming requests were not receiving the lora_request parameter. This was due to an async generator timing issue in the original implementation.

Original Implementation (Buggy)

if processed_request.lora_request:
    original_add_request = self.vllm_engine.add_request
    
    async def add_request_with_lora(*args, **kwargs):
        kwargs['lora_request'] = processed_request.lora_request
        return await original_add_request(*args, **kwargs)
    
    self.vllm_engine.add_request = add_request_with_lora
    try:
        response = await processed_request.inference_invoker(...)
    finally:
        self.vllm_engine.add_request = original_add_request

Problem: For streaming requests, inference_invoker returns an AsyncGenerator that hasn't started executing yet. The wrapper is removed in the finally block before the generator actually runs and calls add_request.

The Fix

The fix passes adapter directly as a parameter to the OpenAI Request object and then send it to inference invoker instead of wrapping the method:

processed_request.vllm_request.model = adapter_name
response = await processed_request.inference_invoker(
            processed_request.vllm_request

This ensures the lora_request is properly passed to vLLM's OpenAI serving methods for both streaming and non-streaming requests.

Testing

Manual testing confirms adapter is getting utilized

engines/python/setup/djl_python/lmi_vllm/vllm_async_service.py

wait for tests

engines/python/setup/djl_python/lmi_vllm/vllm_async_service.py

engines/python/setup/djl_python/lmi_vllm/request_response_utils.py

Lokiiiiii · 2025-11-19T20:00:47Z

Manual testing confirms adapter is getting utilized

engines/python/setup/djl_python/lmi_vllm/request_response_utils.py

Co-authored-by: Loki <lokravi@amazon.com>

Fix race condition in streaming responses

6086aea

HappyAmazonian previously approved these changes Nov 19, 2025

View reviewed changes

HappyAmazonian reviewed Nov 19, 2025

View reviewed changes

engines/python/setup/djl_python/lmi_vllm/vllm_async_service.py Outdated Show resolved Hide resolved

Lokiiiiii added 2 commits November 19, 2025 07:12

Trying a different interface to vllm lora registry

1984e7d

Fixing model name routing for adapter

022f0ee

xyang16 reviewed Nov 19, 2025

View reviewed changes

engines/python/setup/djl_python/lmi_vllm/vllm_async_service.py Outdated Show resolved Hide resolved

removing redundant registration

13dfbf8

xyang16 reviewed Nov 19, 2025

View reviewed changes

engines/python/setup/djl_python/lmi_vllm/request_response_utils.py Show resolved Hide resolved

Adding fallback for backwards compatibility

f979b03

Lokiiiiii marked this pull request as ready for review November 19, 2025 20:00

Lokiiiiii requested review from a team and zachgk as code owners November 19, 2025 20:00

xyang16 reviewed Nov 19, 2025

View reviewed changes

engines/python/setup/djl_python/lmi_vllm/request_response_utils.py Outdated Show resolved Hide resolved

Lokiiiiii added 2 commits November 19, 2025 20:06

Adding fallback for backwards compatibility

6999ecb

Reverting backwards compatibility change for fast fail behavior

bb5edfc

xyang16 approved these changes Nov 19, 2025

View reviewed changes

Lokiiiiii merged commit f2d5b20 into master Nov 19, 2025
7 of 9 checks passed

Lokiiiiii deleted the loki-lora-async-stream-fix branch November 19, 2025 20:49

HappyAmazonian pushed a commit that referenced this pull request Nov 19, 2025

Fix race condition in LoRA streaming requests (#2954)

b9ab660

HappyAmazonian added a commit that referenced this pull request Nov 19, 2025

Fix race condition in LoRA streaming requests (#2954) (#2957)

bfe6b3e

Co-authored-by: Loki <lokravi@amazon.com>

ksuma2109 pushed a commit that referenced this pull request Nov 19, 2025

Fix race condition in LoRA streaming requests (#2954)

6eb76e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race condition in LoRA streaming requests#2954

Fix race condition in LoRA streaming requests#2954
Lokiiiiii merged 7 commits intomasterfrom
loki-lora-async-stream-fix

Lokiiiiii commented Nov 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Lokiiiiii commented Nov 19, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Lokiiiiii commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Bug

Original Implementation (Buggy)

The Fix

Testing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Lokiiiiii commented Nov 19, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Lokiiiiii commented Nov 19, 2025 •

edited

Loading