Introduce QUIC connection lifecycle tracing and dead connection pruning by HudsonGraeme · Pull Request #50 · inference-labs-inc/btlightning

HudsonGraeme · 2026-03-04T14:32:51Z

Summary

Add debug-level tracing to QUIC server dispatch (accept_bi loop, stream handling) and client synapse queries (connection status, bi-stream open/send/receive)
Prune dead connections in update_miner_registry_inner by checking close_reason() before retaining
Expose connection health in get_connection_stats() by checking close_reason() instead of always reporting "active"
Add 4 integration tests covering connection stats, dead connection pruning, registry updates, and stats-after-close behavior

Test plan

cargo test — 4 new integration tests pass
Deployed to testnet, verified QUIC synapse queries reach miner dispatch handler
Confirmed dead connections pruned on metagraph sync, transport cache correctly seeds HttpOnly for unreachable miners

Summary by CodeRabbit

Bug Fixes
- Improved connection state management with automatic dead connection cleanup and pruning.
- Enhanced error handling for edge cases in message processing and connection synchronization.
Tests
- Added integration tests for query execution after registry updates and delayed propagation scenarios.
- Added tests for handling unreachable peers and connection timeout behavior.
Chores
- Expanded debug and diagnostic logging throughout connection handling and stream processing for better troubleshooting visibility.

Log close_reason when detecting dead connections on client side and when server-side stream accept loop terminates. Log when server replaces a validator connection during re-handshake.

Investigate systematic QUIC synapse query failures by adding diagnostic tracing to send_synapse_packet and server dispatch paths. Resolve stale connections persisting in the registry by pruning dead connections during update_miner_registry, and report actual connection health status in get_connection_stats.

coderabbitai · 2026-03-04T14:33:10Z

Warning

Rate limit exceeded

@HudsonGraeme has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 18 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2a8ddecc-663e-4af4-86df-e45ede10d92b

📥 Commits

Reviewing files that changed from the base of the PR and between 4994c7d and 09fabda.

📒 Files selected for processing (4)

crates/btlightning/src/client.rs
crates/btlightning/src/registry.rs
crates/btlightning/src/signing.rs
crates/btlightning/tests/integration.rs

Walkthrough

The changes add comprehensive debug and info-level logging across client and server components to improve observability of connection lifecycle events and query operations. Additionally, dead connection pruning logic is introduced in the miner registry update path, and new integration tests verify query execution under various registry update and connectivity scenarios.

Changes

Cohort / File(s)	Summary
Client Connection Management `crates/btlightning/src/client.rs`	Added debug logging for connection state checks, synapse handling milestones (stream opening, frame sending, response reception), and latency tracking. Introduced pruning of dead connections (identified by close_reason) during miner registry updates with associated hotkey deregistration. Enhanced query_axon and query_axon_stream to differentiate between active, closed, and missing connections with explicit reconnect triggers. Updated get_connection_stats to report per-address statuses.
Server Logging Enhancements `crates/btlightning/src/server/dispatch.rs`, `crates/btlightning/src/server/handshake.rs`	Added debug/info logging for connection setup (stable_id and remote address), stream acceptance events, frame reception, and connection closure with error and close reason details. Added warning log when validator connections are replaced, including old and new addresses before closure.
Integration Test Coverage `crates/btlightning/tests/integration.rs`	Introduced new integration tests covering query execution via miner registry updates, delayed registry propagation, repeated updates, and unreachable peer handling. Tests verify query success after registry state changes and validate per-connection behavior under connectivity constraints.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Connections logged from dusk to dawn,
Dead links pruned when registry's withdrawn,
Handshakes dance with warnings bright,
Tests ensure the peers stay right!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main changes: adding QUIC connection lifecycle tracing and implementing dead connection pruning, which are the primary focus of the changeset across all modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch investigate/quic-connection-lifecycle-tracing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

crates/btlightning/tests/integration.rs (1)

1993-2023: Replace fixed delay with condition-based waiting in this test.

Line 2013 uses an unconditional 5s sleep, which adds avoidable test latency and can still be timing-sensitive. Prefer a bounded poll loop that exits as soon as the condition is met.

♻️ Proposed test refactor

-    tokio::time::sleep(Duration::from_secs(5)).await;
+    tokio::time::timeout(Duration::from_secs(5), async {
+        loop {
+            let stats = client.get_connection_stats().await.unwrap();
+            if stats.get("total_connections").is_some_and(|v| v == "1") {
+                break;
+            }
+            tokio::time::sleep(Duration::from_millis(50)).await;
+        }
+    })
+    .await
+    .expect("connection should become active within 5s");

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/btlightning/tests/integration.rs` around lines 1993 - 2023, In the
test query_works_via_update_miner_registry_after_delay replace the unconditional
tokio::time::sleep(Duration::from_secs(5)).await with a bounded poll loop that
repeatedly attempts the condition (e.g., call
client.query_axon_with_timeout(axon.clone(), build_request("echo"),
short_timeout) or another lightweight probe) until it succeeds or a total
timeout elapses; use Instant::now() + Duration::from_secs(N) as the deadline,
sleep a short interval (e.g., 50–200ms) between attempts, and fail the test if
the deadline is reached—this keeps the call to update_miner_registry and the
axon/query flow (update_miner_registry, query_axon_with_timeout, build_request,
Duration) but removes the fixed 5s delay.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/btlightning/tests/integration.rs`:
- Around line 1993-2023: In the test
query_works_via_update_miner_registry_after_delay replace the unconditional
tokio::time::sleep(Duration::from_secs(5)).await with a bounded poll loop that
repeatedly attempts the condition (e.g., call
client.query_axon_with_timeout(axon.clone(), build_request("echo"),
short_timeout) or another lightweight probe) until it succeeds or a total
timeout elapses; use Instant::now() + Duration::from_secs(N) as the deadline,
sleep a short interval (e.g., 50–200ms) between attempts, and fail the test if
the deadline is reached—this keeps the call to update_miner_registry and the
axon/query flow (update_miner_registry, query_axon_with_timeout, build_request,
Duration) but removes the fixed 5s delay.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3c50849b-c81f-4994-b44e-afb27fcfe107

📥 Commits

Reviewing files that changed from the base of the PR and between 5f879e0 and 4994c7d.

📒 Files selected for processing (4)

crates/btlightning/src/client.rs
crates/btlightning/src/server/dispatch.rs
crates/btlightning/src/server/handshake.rs
crates/btlightning/tests/integration.rs

Eliminate field_reassign_with_default in client config tests using struct update syntax. Replace unnecessary_get_then_check in registry with contains_key. Reorder signing.rs to place feature-gated items before test module. Replace unconditional 5s sleep in integration test with bounded poll loop that retries at 250ms intervals with a 10s deadline.

HudsonGraeme added 2 commits March 3, 2026 20:30

Introduce verbose QUIC connection lifecycle tracing

f2807c0

Log close_reason when detecting dead connections on client side and when server-side stream accept loop terminates. Log when server replaces a validator connection during re-handshake.

coderabbitai bot reviewed Mar 4, 2026

View reviewed changes

HudsonGraeme force-pushed the investigate/quic-connection-lifecycle-tracing branch from 760d9cf to 09fabda Compare March 4, 2026 15:18

HudsonGraeme merged commit a7f38de into main Mar 4, 2026
16 checks passed

HudsonGraeme deleted the investigate/quic-connection-lifecycle-tracing branch March 4, 2026 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce QUIC connection lifecycle tracing and dead connection pruning#50

Introduce QUIC connection lifecycle tracing and dead connection pruning#50
HudsonGraeme merged 3 commits intomainfrom
investigate/quic-connection-lifecycle-tracing

HudsonGraeme commented Mar 4, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 4, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HudsonGraeme commented Mar 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HudsonGraeme commented Mar 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 4, 2026 •

edited

Loading