Skip to content

Conversation

@skidder
Copy link

@skidder skidder commented Jan 20, 2026

Summary

This PR adds a standalone reproduction tool that demonstrates a bug where Pingora doesn't detect client disconnects while waiting on a cache lock, causing servers to hold connections for the full lock timeout (up to 60 seconds) even after clients have disconnected.

The Bug

When multiple requests hit the same uncached URL with cache lock enabled:

  1. First request (writer) acquires the cache lock and fetches from origin
  2. Subsequent requests (readers) wait on the cache lock
  3. BUG: If a reader's client disconnects while waiting, the server keeps waiting on the cache lock until it times out

This causes:

  • Server resources held for 60+ seconds after client disconnect
  • Potential resource exhaustion under high concurrency
  • Real user-facing delays when clients disconnect (page navigation, mobile network issues, etc.)

The Reproduction

The cache-lock-bug-repro/ directory contains:

  • slow_origin.rs: Pure Rust slow origin server (no OpenResty dependency)
  • proxy.rs: Minimal Pingora proxy with caching and cache lock enabled
  • test_client.rs: Test client that measures server disconnect detection time

To run:

cd cache-lock-bug-repro
./run_test.sh

Test Results

State Server close time after client disconnect
Without fix (bug) 19-24 seconds
With fix 164-182 microseconds

Proposed Fix

A proposed fix is available in a separate branch: skidder/cache-lock-client-disconnect-v1

The fix uses tokio::select! to race the cache lock wait against client disconnect detection via session.downstream_session.read_body_or_idle(true).

Test plan

  • Run reproduction tool WITHOUT fix - observe 19-24s delays
  • Run reproduction tool WITH fix - observe ~180µs response times
  • Review fix branch for correctness
  • Ensure existing cache lock tests still pass

🤖 Generated with Claude Code

This adds a self-contained reproduction tool that demonstrates a bug where
Pingora doesn't detect client disconnects while waiting on a cache lock,
causing servers to hold connections for the full lock timeout (up to 60s)
even after clients have disconnected.

The reproduction includes:
- slow_origin.rs: A simple slow origin server (pure Rust, no OpenResty)
- proxy.rs: Minimal Pingora proxy with caching and cache lock enabled
- test_client.rs: Test client that measures disconnect detection time

To run: cd cache-lock-bug-repro && ./run_test.sh

Results without fix: Server takes 19-24 seconds to close after disconnect
Results with fix: Server closes in ~180 microseconds after disconnect

A proposed fix is available in branch skidder/cache-lock-client-disconnect-v1

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@skidder skidder closed this Jan 20, 2026
@skidder skidder deleted the skidder/cache-lock-bug-repro branch January 20, 2026 19:40
@skidder skidder restored the skidder/cache-lock-bug-repro branch January 20, 2026 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant