Skip to content

Fix race condition in LoRA streaming requests#2954

Merged
Lokiiiiii merged 7 commits intomasterfrom
loki-lora-async-stream-fix
Nov 19, 2025
Merged

Fix race condition in LoRA streaming requests#2954
Lokiiiiii merged 7 commits intomasterfrom
loki-lora-async-stream-fix

Conversation

@Lokiiiiii
Copy link
Copy Markdown
Member

@Lokiiiiii Lokiiiiii commented Nov 19, 2025

The Bug

When using vLLM async service with LoRA adapters, streaming requests were not receiving the lora_request parameter. This was due to an async generator timing issue in the original implementation.

Original Implementation (Buggy)

if processed_request.lora_request:
    original_add_request = self.vllm_engine.add_request
    
    async def add_request_with_lora(*args, **kwargs):
        kwargs['lora_request'] = processed_request.lora_request
        return await original_add_request(*args, **kwargs)
    
    self.vllm_engine.add_request = add_request_with_lora
    try:
        response = await processed_request.inference_invoker(...)
    finally:
        self.vllm_engine.add_request = original_add_request

Problem: For streaming requests, inference_invoker returns an AsyncGenerator that hasn't started executing yet. The wrapper is removed in the finally block before the generator actually runs and calls add_request.

The Fix

The fix passes adapter directly as a parameter to the OpenAI Request object and then send it to inference invoker instead of wrapping the method:

processed_request.vllm_request.model = adapter_name
response = await processed_request.inference_invoker(
            processed_request.vllm_request

This ensures the lora_request is properly passed to vLLM's OpenAI serving methods for both streaming and non-streaming requests.

Testing

Manual testing confirms adapter is getting utilized

HappyAmazonian
HappyAmazonian previously approved these changes Nov 19, 2025
@HappyAmazonian HappyAmazonian dismissed their stale review November 19, 2025 03:10

wait for tests

@Lokiiiiii
Copy link
Copy Markdown
Member Author

Manual testing confirms adapter is getting utilized

@Lokiiiiii Lokiiiiii marked this pull request as ready for review November 19, 2025 20:00
@Lokiiiiii Lokiiiiii requested review from a team and zachgk as code owners November 19, 2025 20:00
@Lokiiiiii Lokiiiiii merged commit f2d5b20 into master Nov 19, 2025
7 of 9 checks passed
@Lokiiiiii Lokiiiiii deleted the loki-lora-async-stream-fix branch November 19, 2025 20:49
HappyAmazonian added a commit that referenced this pull request Nov 19, 2025
Co-authored-by: Loki <lokravi@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants