Skip to content

Fix worker status for unusable terminal backend responses#615

Draft
ushaket wants to merge 3 commits intovllm-project:mainfrom
ushaket:fix/worker-unusable-terminal-response
Draft

Fix worker status for unusable terminal backend responses#615
ushaket wants to merge 3 commits intovllm-project:mainfrom
ushaket:fix/worker-unusable-terminal-response

Conversation

@ushaket
Copy link
Contributor

@ushaket ushaket commented Feb 26, 2026

Summary

This PR fixes a scheduler correctness bug where requests could be marked as completed even when the backend resolved without a usable terminal response. It adds explicit terminal-response validation in the worker so malformed/empty terminal results are surfaced as errored with a clear diagnostic instead of being counted as successful requests.

Details

  • Added terminal response validation in WorkerProcess to guard final status transitions.
  • Updated request finalization logic so unusable terminal responses set:
    • status: errored
    • error message: [UNUSABLE_BACKEND_RESPONSE] backend resolved without a usable terminal response payload
  • Implemented GenerationResponse-aware usability criteria:
    • usable if non-empty text or output_metrics.total_tokens > 0
    • None terminal response is always unusable
    • non-GenerationResponse fallback remains bool(response) for generic/test compatibility
  • Added regression tests in tests/unit/scheduler/test_worker.py for:
    • no terminal response object -> errored
    • empty GenerationResponse -> errored
    • token-bearing GenerationResponse with empty text -> completed
  • Preserved existing cancellation/error flow behavior outside terminal-response validation.

Test Plan

  • Run targeted worker regression tests:
    • uv run pytest -q tests/unit/scheduler/test_worker.py -k "terminal_response or empty_generation_response or generation_response_with_tokens or invalid_initialization"
  • Verify expected outcomes:
    • missing terminal response is not marked completed
    • empty GenerationResponse is not marked completed
    • non-empty signal (output_tokens > 0) is accepted as completed
  • Confirm no lint issues on touched files.

Related Issues


  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Signed-off-by: Uri Shaket <ushaket@redhat.com>
Signed-off-by: Uri Shaket <ushaket@redhat.com>
@ushaket ushaket marked this pull request as draft February 26, 2026 09:38
@ushaket
Copy link
Contributor Author

ushaket commented Feb 26, 2026

This introduce worker dependency on GenerationResponse, not sure if that's the right way to go,
The reason I added it is that the we need to somehow know what we're looking for, in the future we might have ResponseT that doesn't return text and total_tokens, so if we want to keep this check in the worker, we'll need to be aware of the different ResponseT possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Worker marks completed/successful when final response has no usable data

1 participant