Skip to content

Introduce QUIC connection lifecycle tracing and dead connection pruning#50

Merged
HudsonGraeme merged 3 commits intomainfrom
investigate/quic-connection-lifecycle-tracing
Mar 4, 2026
Merged

Introduce QUIC connection lifecycle tracing and dead connection pruning#50
HudsonGraeme merged 3 commits intomainfrom
investigate/quic-connection-lifecycle-tracing

Conversation

@HudsonGraeme
Copy link
Copy Markdown
Member

@HudsonGraeme HudsonGraeme commented Mar 4, 2026

Summary

  • Add debug-level tracing to QUIC server dispatch (accept_bi loop, stream handling) and client synapse queries (connection status, bi-stream open/send/receive)
  • Prune dead connections in update_miner_registry_inner by checking close_reason() before retaining
  • Expose connection health in get_connection_stats() by checking close_reason() instead of always reporting "active"
  • Add 4 integration tests covering connection stats, dead connection pruning, registry updates, and stats-after-close behavior

Test plan

  • cargo test — 4 new integration tests pass
  • Deployed to testnet, verified QUIC synapse queries reach miner dispatch handler
  • Confirmed dead connections pruned on metagraph sync, transport cache correctly seeds HttpOnly for unreachable miners

Summary by CodeRabbit

  • Bug Fixes

    • Improved connection state management with automatic dead connection cleanup and pruning.
    • Enhanced error handling for edge cases in message processing and connection synchronization.
  • Tests

    • Added integration tests for query execution after registry updates and delayed propagation scenarios.
    • Added tests for handling unreachable peers and connection timeout behavior.
  • Chores

    • Expanded debug and diagnostic logging throughout connection handling and stream processing for better troubleshooting visibility.

Log close_reason when detecting dead connections on client side
and when server-side stream accept loop terminates. Log when
server replaces a validator connection during re-handshake.
Investigate systematic QUIC synapse query failures by adding
diagnostic tracing to send_synapse_packet and server dispatch
paths. Resolve stale connections persisting in the registry by
pruning dead connections during update_miner_registry, and
report actual connection health status in get_connection_stats.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 4, 2026

Warning

Rate limit exceeded

@HudsonGraeme has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 18 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2a8ddecc-663e-4af4-86df-e45ede10d92b

📥 Commits

Reviewing files that changed from the base of the PR and between 4994c7d and 09fabda.

📒 Files selected for processing (4)
  • crates/btlightning/src/client.rs
  • crates/btlightning/src/registry.rs
  • crates/btlightning/src/signing.rs
  • crates/btlightning/tests/integration.rs

Walkthrough

The changes add comprehensive debug and info-level logging across client and server components to improve observability of connection lifecycle events and query operations. Additionally, dead connection pruning logic is introduced in the miner registry update path, and new integration tests verify query execution under various registry update and connectivity scenarios.

Changes

Cohort / File(s) Summary
Client Connection Management
crates/btlightning/src/client.rs
Added debug logging for connection state checks, synapse handling milestones (stream opening, frame sending, response reception), and latency tracking. Introduced pruning of dead connections (identified by close_reason) during miner registry updates with associated hotkey deregistration. Enhanced query_axon and query_axon_stream to differentiate between active, closed, and missing connections with explicit reconnect triggers. Updated get_connection_stats to report per-address statuses.
Server Logging Enhancements
crates/btlightning/src/server/dispatch.rs, crates/btlightning/src/server/handshake.rs
Added debug/info logging for connection setup (stable_id and remote address), stream acceptance events, frame reception, and connection closure with error and close reason details. Added warning log when validator connections are replaced, including old and new addresses before closure.
Integration Test Coverage
crates/btlightning/tests/integration.rs
Introduced new integration tests covering query execution via miner registry updates, delayed registry propagation, repeated updates, and unreachable peer handling. Tests verify query success after registry state changes and validate per-connection behavior under connectivity constraints.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Connections logged from dusk to dawn,
Dead links pruned when registry's withdrawn,
Handshakes dance with warnings bright,
Tests ensure the peers stay right!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: adding QUIC connection lifecycle tracing and implementing dead connection pruning, which are the primary focus of the changeset across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch investigate/quic-connection-lifecycle-tracing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/btlightning/tests/integration.rs (1)

1993-2023: Replace fixed delay with condition-based waiting in this test.

Line 2013 uses an unconditional 5s sleep, which adds avoidable test latency and can still be timing-sensitive. Prefer a bounded poll loop that exits as soon as the condition is met.

♻️ Proposed test refactor
-    tokio::time::sleep(Duration::from_secs(5)).await;
+    tokio::time::timeout(Duration::from_secs(5), async {
+        loop {
+            let stats = client.get_connection_stats().await.unwrap();
+            if stats.get("total_connections").is_some_and(|v| v == "1") {
+                break;
+            }
+            tokio::time::sleep(Duration::from_millis(50)).await;
+        }
+    })
+    .await
+    .expect("connection should become active within 5s");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/btlightning/tests/integration.rs` around lines 1993 - 2023, In the
test query_works_via_update_miner_registry_after_delay replace the unconditional
tokio::time::sleep(Duration::from_secs(5)).await with a bounded poll loop that
repeatedly attempts the condition (e.g., call
client.query_axon_with_timeout(axon.clone(), build_request("echo"),
short_timeout) or another lightweight probe) until it succeeds or a total
timeout elapses; use Instant::now() + Duration::from_secs(N) as the deadline,
sleep a short interval (e.g., 50–200ms) between attempts, and fail the test if
the deadline is reached—this keeps the call to update_miner_registry and the
axon/query flow (update_miner_registry, query_axon_with_timeout, build_request,
Duration) but removes the fixed 5s delay.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/btlightning/tests/integration.rs`:
- Around line 1993-2023: In the test
query_works_via_update_miner_registry_after_delay replace the unconditional
tokio::time::sleep(Duration::from_secs(5)).await with a bounded poll loop that
repeatedly attempts the condition (e.g., call
client.query_axon_with_timeout(axon.clone(), build_request("echo"),
short_timeout) or another lightweight probe) until it succeeds or a total
timeout elapses; use Instant::now() + Duration::from_secs(N) as the deadline,
sleep a short interval (e.g., 50–200ms) between attempts, and fail the test if
the deadline is reached—this keeps the call to update_miner_registry and the
axon/query flow (update_miner_registry, query_axon_with_timeout, build_request,
Duration) but removes the fixed 5s delay.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3c50849b-c81f-4994-b44e-afb27fcfe107

📥 Commits

Reviewing files that changed from the base of the PR and between 5f879e0 and 4994c7d.

📒 Files selected for processing (4)
  • crates/btlightning/src/client.rs
  • crates/btlightning/src/server/dispatch.rs
  • crates/btlightning/src/server/handshake.rs
  • crates/btlightning/tests/integration.rs

Eliminate field_reassign_with_default in client config tests using struct
update syntax. Replace unnecessary_get_then_check in registry with
contains_key. Reorder signing.rs to place feature-gated items before test
module. Replace unconditional 5s sleep in integration test with bounded
poll loop that retries at 250ms intervals with a 10s deadline.
@HudsonGraeme HudsonGraeme force-pushed the investigate/quic-connection-lifecycle-tracing branch from 760d9cf to 09fabda Compare March 4, 2026 15:18
@HudsonGraeme HudsonGraeme merged commit a7f38de into main Mar 4, 2026
16 checks passed
@HudsonGraeme HudsonGraeme deleted the investigate/quic-connection-lifecycle-tracing branch March 4, 2026 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant