Skip to content

Fix panic in SGLang proxy handling of concurrent requests#632

Open
yangligt2 wants to merge 2 commits intollm-d:mainfrom
yangligt2:fix-sglang-panic
Open

Fix panic in SGLang proxy handling of concurrent requests#632
yangligt2 wants to merge 2 commits intollm-d:mainfrom
yangligt2:fix-sglang-panic

Conversation

@yangligt2
Copy link

Fix #631

Description

This PR addresses a critical connection termination bug where the pd-sidecar panics with net/http: abort Handler under concurrent load (concurrency > 1) when running with the SGLang connector.

Root Cause:
In the SGLang architecture, the sidecar splits incoming requests and submits them to both the prefill and decode backends concurrently. The decode pipeline runs synchronously, while the prefill pipeline is dispatched asynchronously in a background goroutine.

Both requests were previously inheriting the identical request context using r.Clone(r.Context()). Because the prefill 200 OK response stream frequently arrives after the decode processing has successfully completed and returned, the standard Go HTTP server automatically cancels the parent context r.Context().

When this cancellation occurs before the background prefill proxy has fully read its response stream, httputil.ReverseProxy interprets it as an aborted connection and fatals with http.ErrAbortHandler, bringing down the sidecar pod.

As seen in the sidecar debug logs under load, the prefill request completions (prefill request completed) log significantly later than the target forwarding:

...
{"level":"Level(-4)","ts":"2026-02-18T05:45:31Z","logger":"proxy server on port 8000","msg":"SSRF protection: prefill target allowed","target":"10.120.11.7:8000"}
{"level":"Level(-4)","ts":"2026-02-18T05:45:31Z","logger":"proxy server on port 8000","msg":"running SGLang protocol","url":"10.120.11.7:8000"}
{"level":"Level(-5)","ts":"2026-02-18T05:45:31Z","logger":"proxy server on port 8000","msg":"bootstrap info added","bootstrap_host":"10.120.11.7","bootstrap_port":8998,"bootstrap_room":1771393531315488403}
{"level":"Level(-5)","ts":"2026-02-18T05:45:31Z","logger":"proxy server on port 8000","msg":"prefill request completed","status":200}
{"level":"Level(-5)","ts":"2026-02-18T05:45:31Z","logger":"proxy server on port 8000","msg":"prefill request completed","status":200}
{"level":"Level(-5)","ts":"2026-02-18T05:45:31Z","logger":"proxy server on port 8000","msg":"prefill request completed","status":200}
{"level":"Level(-5)","ts":"2026-02-18T05:45:31Z","logger":"proxy server on port 8000","msg":"prefill request completed","status":200}
{"level":"Level(-5)","ts":"2026-02-18T05:45:31Z","logger":"proxy server on port 8000","msg":"prefill request completed","status":200}

Fix:
This PR properly unties the background prefill context lifecycle from the main synchronous request using context.WithoutCancel. A defer recover() was additionally added to the async call stack for extra safety against uncaught internal proxy panics propagating to the main service loop. Added a TestSGLang Ginkgo test suite to validate end-to-end functionality.

Validation

Validated in a real GKE environment with SGLang backends:

  • All benchmark queries succeeded smoothly at concurrency=512.
  • go test -v -run SGLang tests cleanly pass.

@github-actions
Copy link

🚨 Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

@github-actions github-actions bot requested review from kfswain and shmuelk February 18, 2026 06:09
@yangligt2 yangligt2 force-pushed the fix-sglang-panic branch 2 times, most recently from 7c2324a to 0ff5b6e Compare February 18, 2026 06:15
Signed-off-by: YANG LI <yangligt@google.com>
@yangligt2
Copy link
Author

cc: @bongwoobak
Please review the pr to see if there's any concern with SGLang?

Signed-off-by: YANG LI <yangligt@google.com>
@kfswain
Copy link
Collaborator

kfswain commented Feb 18, 2026

/lgtm

@github-actions github-actions bot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm "Looks good to me", indicates that a PR is ready to be merged.

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

SGLang connector crash when sending concurrent requests

3 participants

Comments