Skip to content

Introduce benchmark CI and Python binding test suite#45

Merged
HudsonGraeme merged 6 commits intomainfrom
introduce/benchmark-ci-comparison
Feb 26, 2026
Merged

Introduce benchmark CI and Python binding test suite#45
HudsonGraeme merged 6 commits intomainfrom
introduce/benchmark-ci-comparison

Conversation

@HudsonGraeme
Copy link
Copy Markdown
Member

@HudsonGraeme HudsonGraeme commented Feb 26, 2026

Summary

  • Adds a label-gated (bench) benchmark CI workflow that builds lightning-bench for both the PR and main branches, runs them back-to-back on the same runner, and posts a markdown comparison comment with percentage changes on the PR
  • Introduces a pytest suite covering the full btlightning Python API surface (client, server, roundtrip, streaming, type serialization, error paths, multi-handler routing) with a corresponding CI job using maturin develop --release
  • Scopes .gitignore test file exclusions to root/benchmarks directories instead of globally ignoring test_*.py

Test plan

  • Verify bench.yml YAML parses correctly and job skips on PRs without the bench label
  • Add bench label to a PR and confirm benchmark comparison comment is posted
  • Verify python-test CI job passes: maturin develop --release && pytest tests/ -v --timeout=30
  • Run tests locally: cd crates/btlightning-py && maturin develop --release && pytest tests/ -v --timeout=30

Summary by CodeRabbit

  • Tests

    • Added comprehensive Python tests for client/server lifecycle, echo roundtrips (large payloads, streaming), multi-handler concurrency, error handling, data type roundtrips, and connection stats.
    • Introduced shared pytest fixtures for in-process server/client wiring and ephemeral port allocation.
  • Chores

    • Added a PR benchmark comparison workflow that posts automated benchmark summaries.
    • Extended CI to run Python tests; updated test discovery, timeouts, and ignore patterns.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8824bda and f97b0c3.

📒 Files selected for processing (1)
  • crates/btlightning-py/tests/test_errors.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/btlightning-py/tests/test_errors.py

Walkthrough

Adds a GitHub Actions benchmark workflow, a Python test CI job, pytest configuration and fixtures, and a suite of new pytest modules exercising client/server behavior, error handling, streaming, roundtrips, and data-type roundtrips for the btlightning-py crate.

Changes

Cohort / File(s) Summary
Workflows
\.github/workflows/bench.yml, \.github/workflows/ci.yml
Adds a "Benchmarks" workflow that builds PR and main benchmarks, runs both, captures JSON results, and posts a comparison comment (via github-script); adds a python-test CI job and marks subtensor as continue-on-error.
VCS & Pyproject
\.gitignore, crates/btlightning-py/pyproject.toml
Adjusts test ignore patterns to include subdirectories and benchmarks; adds pytest optional-deps and [tool.pytest.ini_options] (tests path and 30s timeout).
Test Fixtures
crates/btlightning-py/tests/conftest.py
Introduces constants and fixtures: free_port, echo_server (LightningServer with echo handler running in daemon thread), and client_and_axon (client connected to server), with setup/teardown.
Client Tests
crates/btlightning-py/tests/test_client.py
Adds tests for Lightning client construction (defaults/custom), validator keypair, Python signer, and basic connection stats.
Server Tests
crates/btlightning-py/tests/test_server.py
Adds tests for LightningServer construction (defaults/custom), start/stop lifecycle, and connection stats reporting.
Error & Validation Tests
crates/btlightning-py/tests/test_errors.py
Adds tests validating missing fields (synapse_type, hotkey, ip, port), signer-required behavior, and invalid timeout rejection.
Multi-handler & Streaming
crates/btlightning-py/tests/test_multi_handler.py, crates/btlightning-py/tests/test_streaming.py
Adds tests that verify multiple synapse handlers concurrently and a streaming handler that yields chunks, asserting correct delivery.
Roundtrip & Types
crates/btlightning-py/tests/test_roundtrip.py, crates/btlightning-py/tests/test_types.py
Adds echo roundtrip tests (including large 100KB payloads, timeout, sequential queries) and parametrized type roundtrip tests for None, booleans, numbers, strings, bytes, lists, and dicts.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant GH as GitHub Actions
participant Runner as Action Runner
participant PR_Build as PR Benchmark Build
participant Main_Build as Main Benchmark Build
participant Bench as Benchmark Executor
participant API as GitHub API (comment)
GH->>Runner: workflow triggered (label or PR sync)
Runner->>PR_Build: checkout PR, setup Rust, build benchmark
PR_Build-->>Runner: PR binary copied
Runner->>Main_Build: checkout origin/main, setup, build (allow-fail)
alt main build succeeds
Main_Build-->>Runner: main binary copied
else main build fails
Runner-->>Runner: proceed without main results
end
Runner->>Bench: run PR benchmark -> produce JSON
Runner->>Bench: run main benchmark (if available) -> produce JSON
Bench-->>Runner: JSON results
Runner->>API: post or update PR comment with comparison table

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I hopped through tests with nimble feet,

Echoes answered, payloads met and beat,
Servers hummed and streams unfurled,
Benchmarks chimed across the world,
A rabbit cheers for passing suite!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely summarizes the main changes: introducing benchmark CI workflow and a Python test suite for bindings.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch introduce/benchmark-ci-comparison

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@HudsonGraeme HudsonGraeme force-pushed the introduce/benchmark-ci-comparison branch from dbd1fe0 to 2e0b5d8 Compare February 26, 2026 02:08
@HudsonGraeme HudsonGraeme force-pushed the introduce/benchmark-ci-comparison branch 2 times, most recently from 81925c9 to 29d2893 Compare February 26, 2026 02:11
@HudsonGraeme HudsonGraeme force-pushed the introduce/benchmark-ci-comparison branch from 29d2893 to 2e4d482 Compare February 26, 2026 02:12
@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 26, 2026

Benchmark Results

Connection Setup (ms)

Percentile PR main change
p50 2.00 2.03 (-1.4%)
p95 3.32 3.68 (-9.9%)
p99 3.62 4.44 (-18.5% +++)

Latency (ms)

Size Percentile PR main change
256B p50 0.19 0.20 (-5.5%)
256B p95 0.24 0.26 (-7.6%)
256B p99 0.26 0.30 (-13.2% +++)
1KB p50 0.19 0.21 (-5.2%)
1KB p95 0.24 0.26 (-5.1%)
1KB p99 0.26 0.28 (-5.3%)
10KB p50 0.24 0.24 (+0.6%)
10KB p95 0.30 0.29 (+4.1%)
10KB p99 0.33 0.31 (+5.6%)
100KB p50 0.81 0.76 (+5.5%)
100KB p95 0.89 0.85 (+4.9%)
100KB p99 0.93 0.90 (+3.4%)
1MB p50 5.99 5.76 (+3.9%)
1MB p95 6.60 6.34 (+4.2%)
1MB p99 6.79 6.65 (+2.1%)

Throughput (req/s)

Size PR main change
256B 20390.39 20414.96 (-0.1%)
1KB 19036.38 19350.43 (-1.6%)
10KB 10943.94 11193.59 (-2.2%)
100KB 2103.15 2141.78 (-1.8%)
1MB 242.79 244.25 (-0.6%)

Wire Bytes

Size PR main change
256B 284 284 (0.0%)
1KB 1052 1052 (0.0%)
10KB 10268 10268 (0.0%)
100KB 102430 102430 (0.0%)
1MB 1048606 1048606 (0.0%)

@HudsonGraeme
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 26, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/bench.yml:
- Around line 75-76: The ternary expression that sets the benchmark marker uses
a 15% threshold but should use 10% to match the stated tolerance—update both
occurrences of the literal 15 in the expression that compares pct (the ternary
using ? (pct > 15 ? ' !!!' : pct < -15 ? ' +++' : '') : (pct < -15 ? ' !!!' :
pct > 15 ? ' +++' : '')) to 10 so the regression/improvement markers reflect a
±10% tolerance while preserving the existing sign and ordering logic.

In `@crates/btlightning-py/tests/conftest.py`:
- Around line 27-33: The fixture uses a brittle fixed sleep and doesn't join the
serve thread; change it to wait for the server to be ready (poll a ready
predicate or attempt a short connect to free_port with a timeout loop) instead
of time.sleep(0.05) before yielding, and after calling server.stop() join the
thread (t.join(timeout)) to ensure the serve_forever thread exits and avoid
leakage; update references around server.start(), threading.Thread(...
target=server.serve_forever) / t.start(), the yield server, free_port, and
server.stop() to implement the readiness loop and t.join().

In `@crates/btlightning-py/tests/test_errors.py`:
- Around line 35-36: The test currently uses a blind pytest.raises(Exception)
for the query_axon error-path; replace that with the concrete exception classes
raised by the bindings (e.g., the specific Python error type constructed in the
Rust wrapper) for the "missing signer" and "invalid timeout" cases. Locate where
query_axon maps errors to Python exceptions (search for query_axon and the
PyErr/new_err/Py*Error creation in the bindings), then change the two
pytest.raises(Exception) calls in the test to
pytest.raises(<ConcreteExceptionClass>) using the exact exception types you find
and ensure the raised message assertion (if any) matches the expected text.

In `@crates/btlightning-py/tests/test_multi_handler.py`:
- Around line 22-41: The test is flaky because it relies on time.sleep and only
cleans up at the end; make startup/teardown deterministic and always run cleanup
by wrapping the setup and assertions in a try/finally so client.close(),
server.stop() and thread.join() are always called, and replace the fixed
time.sleep with a deterministic sync (e.g., wait on a server-started event or a
new server.wait_until_ready() method) after server.start() before starting
requests; update references in this test to use server.start(), the
serve_forever thread, client.initialize_connections(), client.query_axon(), and
in the finally block call client.close(), server.stop() and t.join() to avoid
leaked resources.

In `@crates/btlightning-py/tests/test_server.py`:
- Around line 35-48: Wrap the test setup and assertions for LightningServer in a
try/finally so the server teardown always runs: after creating server =
LightningServer(...), starting it (server.start()) and launching the
serve_forever thread (t = threading.Thread(...); t.start()), put the assertions
in the try block and call server.stop() in the finally block (and join the
thread t if needed) to ensure the server is stopped even if an assertion fails;
apply the same try/finally pattern to the other test block covering lines 51-63.
- Around line 39-42: The test uses a fixed time.sleep(0.05) after starting the
thread running server.serve_forever which is flaky; replace the sleep with an
explicit readiness check (either wait on a threading.Event the server sets when
ready or poll-connect to the server port) so the test only proceeds once the
server is actually accepting connections. Locate the thread start (t =
threading.Thread(target=server.serve_forever, daemon=True); t.start()) and
remove the time.sleep calls, then add a short loop that attempts to connect (or
waits on a server-ready Event exposed by the server) with a small timeout and
fail the test if readiness isn't achieved within a reasonable total timeout.

In `@crates/btlightning-py/tests/test_streaming.py`:
- Around line 28-38: Wrap the test logic that uses the Lightning instance and
server in a try/finally: after creating client = Lightning(...), calling
set_validator_keypair, initialize_connections and obtaining stream via
query_axon_stream, run the assertions inside the try block and move
client.close() and server.stop() into the finally block so they always run;
ensure any resources created by Lightning (client) and the test server are
cleaned up even if assertions fail.
- Around line 23-27: Replace the brittle fixed sleep after
server.start()/t.start() with a deterministic readiness poll: after calling
server.start() and starting the serve_forever thread, loop with a short sleep
(e.g. 5-50ms) up to a timeout and attempt to connect to the server (or check a
server-provided ready flag) until the connection succeeds, then proceed; replace
the time.sleep(0.05) with this polling loop (use server.serve_forever,
server.start, and the server's listen address/socket to probe) so the test waits
deterministically for the server to be ready instead of relying on a fixed
delay.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c1d17c and bb02ad9.

📒 Files selected for processing (12)
  • .github/workflows/bench.yml
  • .github/workflows/ci.yml
  • .gitignore
  • crates/btlightning-py/pyproject.toml
  • crates/btlightning-py/tests/conftest.py
  • crates/btlightning-py/tests/test_client.py
  • crates/btlightning-py/tests/test_errors.py
  • crates/btlightning-py/tests/test_multi_handler.py
  • crates/btlightning-py/tests/test_roundtrip.py
  • crates/btlightning-py/tests/test_server.py
  • crates/btlightning-py/tests/test_streaming.py
  • crates/btlightning-py/tests/test_types.py

Comment on lines +75 to +76
? (pct > 15 ? ' !!!' : pct < -15 ? ' +++' : '')
: (pct < -15 ? ' !!!' : pct > 15 ? ' +++' : '');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Regression/improvement threshold appears inconsistent with the stated 10% tolerance.

The benchmark marker logic is still using 15%, so comment annotations won’t match the intended tolerance.

🛠️ Proposed fix
-                ? (pct > 15 ? ' !!!' : pct < -15 ? ' +++' : '')
-                : (pct < -15 ? ' !!!' : pct > 15 ? ' +++' : '');
+                ? (pct > 10 ? ' !!!' : pct < -10 ? ' +++' : '')
+                : (pct < -10 ? ' !!!' : pct > 10 ? ' +++' : '');
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
? (pct > 15 ? ' !!!' : pct < -15 ? ' +++' : '')
: (pct < -15 ? ' !!!' : pct > 15 ? ' +++' : '');
? (pct > 10 ? ' !!!' : pct < -10 ? ' +++' : '')
: (pct < -10 ? ' !!!' : pct > 10 ? ' +++' : '');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/bench.yml around lines 75 - 76, The ternary expression
that sets the benchmark marker uses a 15% threshold but should use 10% to match
the stated tolerance—update both occurrences of the literal 15 in the expression
that compares pct (the ternary using ? (pct > 15 ? ' !!!' : pct < -15 ? ' +++' :
'') : (pct < -15 ? ' !!!' : pct > 15 ? ' +++' : '')) to 10 so the
regression/improvement markers reflect a ±10% tolerance while preserving the
existing sign and ordering logic.

Comment on lines +27 to +33
server.start()
t = threading.Thread(target=server.serve_forever, daemon=True)
t.start()
time.sleep(0.05)
yield server, free_port
server.stop()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden fixture lifecycle to reduce startup flakiness and thread leakage.

A fixed delay is brittle, and teardown should wait for the serve thread to exit after server.stop().

♻️ Proposed fixture hardening
 def echo_server(free_port):
@@
     server.register_synapse_handler("echo", lambda data: data)
     server.start()
     t = threading.Thread(target=server.serve_forever, daemon=True)
     t.start()
-    time.sleep(0.05)
+    deadline = time.time() + 2.0
+    while time.time() < deadline:
+        try:
+            with socket.create_connection(("127.0.0.1", free_port), timeout=0.1):
+                break
+        except OSError:
+            time.sleep(0.01)
+    else:
+        raise AssertionError("echo server did not become ready in time")
     yield server, free_port
     server.stop()
+    t.join(timeout=1.0)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/btlightning-py/tests/conftest.py` around lines 27 - 33, The fixture
uses a brittle fixed sleep and doesn't join the serve thread; change it to wait
for the server to be ready (poll a ready predicate or attempt a short connect to
free_port with a timeout loop) instead of time.sleep(0.05) before yielding, and
after calling server.stop() join the thread (t.join(timeout)) to ensure the
serve_forever thread exits and avoid leakage; update references around
server.start(), threading.Thread(... target=server.serve_forever) / t.start(),
the yield server, free_port, and server.stop() to implement the readiness loop
and t.join().

Comment on lines +39 to +42
t = threading.Thread(target=server.serve_forever, daemon=True)
t.start()
time.sleep(0.05)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid fixed sleeps for server readiness in lifecycle tests.

Line 41 and Line 57 rely on time.sleep(0.05), which is timing-sensitive and can intermittently fail in busy CI environments.

Suggested fix pattern
-    time.sleep(0.05)
+    deadline = time.time() + 2.0
+    while time.time() < deadline:
+        try:
+            with socket.create_connection(("127.0.0.1", port), timeout=0.05):
+                break
+        except OSError:
+            time.sleep(0.01)
+    else:
+        raise AssertionError("Server did not become ready in time")

Also applies to: 55-58

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/btlightning-py/tests/test_server.py` around lines 39 - 42, The test
uses a fixed time.sleep(0.05) after starting the thread running
server.serve_forever which is flaky; replace the sleep with an explicit
readiness check (either wait on a threading.Event the server sets when ready or
poll-connect to the server port) so the test only proceeds once the server is
actually accepting connections. Locate the thread start (t =
threading.Thread(target=server.serve_forever, daemon=True); t.start()) and
remove the time.sleep calls, then add a short loop that attempts to connect (or
waits on a server-ready Event exposed by the server) with a small timeout and
fail the test if readiness isn't achieved within a reasonable total timeout.

Comment on lines +23 to +27
server.start()
t = threading.Thread(target=server.serve_forever, daemon=True)
t.start()
time.sleep(0.05)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace fixed startup sleep with deterministic readiness polling.

Line 26 uses a fixed time.sleep(0.05), which is flaky on shared CI runners and can produce intermittent failures.

Suggested fix
     server.start()
     t = threading.Thread(target=server.serve_forever, daemon=True)
     t.start()
-    time.sleep(0.05)
+    deadline = time.time() + 2.0
+    while time.time() < deadline:
+        try:
+            with socket.create_connection(("127.0.0.1", port), timeout=0.05):
+                break
+        except OSError:
+            time.sleep(0.01)
+    else:
+        raise AssertionError("Server did not become ready in time")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/btlightning-py/tests/test_streaming.py` around lines 23 - 27, Replace
the brittle fixed sleep after server.start()/t.start() with a deterministic
readiness poll: after calling server.start() and starting the serve_forever
thread, loop with a short sleep (e.g. 5-50ms) up to a timeout and attempt to
connect to the server (or check a server-provided ready flag) until the
connection succeeds, then proceed; replace the time.sleep(0.05) with this
polling loop (use server.serve_forever, server.start, and the server's listen
address/socket to probe) so the test waits deterministically for the server to
be ready instead of relying on a fixed delay.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
crates/btlightning-py/tests/test_streaming.py (1)

26-26: ⚠️ Potential issue | 🟠 Major

Use readiness polling instead of a fixed sleep before streaming assertions.

Line 26 uses a fixed delay that can intermittently fail under CI load.

Suggested fix
-    time.sleep(0.05)
+    deadline = time.time() + 2.0
+    while time.time() < deadline:
+        try:
+            with socket.create_connection(("127.0.0.1", port), timeout=0.05):
+                break
+        except OSError:
+            time.sleep(0.01)
+    else:
+        raise AssertionError("Server did not become ready in time")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/btlightning-py/tests/test_streaming.py` at line 26, Replace the
brittle fixed delay time.sleep(0.05) in tests/test_streaming.py with a
readiness-polling loop: identify the readiness condition used by the test (e.g.,
a stream.ready() / connection.is_ready() method, presence of messages in the
stream/queue, or a helper like wait_for_event), poll that condition with a short
sleep interval until a timeout (e.g., 1–5s) and only then perform the streaming
assertions; update the test to fail with a clear timeout message if readiness is
not observed. Ensure you replace the exact time.sleep(0.05) call so the test
waits deterministically for the stream to be ready rather than relying on a
fixed delay.
crates/btlightning-py/tests/test_server.py (1)

41-41: ⚠️ Potential issue | 🟠 Major

Replace fixed startup sleeps with deterministic readiness polling.

Line 41 and Line 59 still rely on a fixed time.sleep(0.05), which is timing-sensitive and can flake on slower CI runners.

Suggested fix
+def _wait_until_listening(host: str, port: int, timeout_secs: float = 2.0) -> None:
+    deadline = time.time() + timeout_secs
+    while time.time() < deadline:
+        try:
+            with socket.create_connection((host, port), timeout=0.05):
+                return
+        except OSError:
+            time.sleep(0.01)
+    raise AssertionError(f"Server {host}:{port} did not become ready in time")
+
@@
-    time.sleep(0.05)
+    _wait_until_listening("127.0.0.1", port)
@@
-    time.sleep(0.05)
+    _wait_until_listening("127.0.0.1", port)

Also applies to: 59-59

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/btlightning-py/tests/test_server.py` at line 41, The test uses fixed
time.sleep(0.05) in tests/test_server.py which is flaky; replace both
occurrences with deterministic readiness polling: remove time.sleep(0.05) and
implement a short loop that repeatedly attempts to contact the test server
(e.g., TCP connect to the server port or an HTTP/health endpoint) with a small
backoff and overall timeout (e.g., 2–5s), returning as soon as the connection
succeeds, and failing the test if the timeout elapses; update the two places
where time.sleep is called to use this polling helper (e.g.,
wait_for_server_ready or inline loop) so startup is robust on slow CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/btlightning-py/tests/test_errors.py`:
- Around line 32-37: The test test_query_without_signer should guarantee
Lightning cleanup: wrap the client creation and the pytest.raises block in a
try/finally so that client.close() is always called even if client.query_axon
doesn't raise; specifically, create the Lightning instance (client =
Lightning(...)), run the with pytest.raises(ConnectionError, match="endpoint not
initialized"): client.query_axon(...) inside try, and call client.close() in
finally to ensure cleanup rather than relying on __del__.

---

Duplicate comments:
In `@crates/btlightning-py/tests/test_server.py`:
- Line 41: The test uses fixed time.sleep(0.05) in tests/test_server.py which is
flaky; replace both occurrences with deterministic readiness polling: remove
time.sleep(0.05) and implement a short loop that repeatedly attempts to contact
the test server (e.g., TCP connect to the server port or an HTTP/health
endpoint) with a small backoff and overall timeout (e.g., 2–5s), returning as
soon as the connection succeeds, and failing the test if the timeout elapses;
update the two places where time.sleep is called to use this polling helper
(e.g., wait_for_server_ready or inline loop) so startup is robust on slow CI.

In `@crates/btlightning-py/tests/test_streaming.py`:
- Line 26: Replace the brittle fixed delay time.sleep(0.05) in
tests/test_streaming.py with a readiness-polling loop: identify the readiness
condition used by the test (e.g., a stream.ready() / connection.is_ready()
method, presence of messages in the stream/queue, or a helper like
wait_for_event), poll that condition with a short sleep interval until a timeout
(e.g., 1–5s) and only then perform the streaming assertions; update the test to
fail with a clear timeout message if readiness is not observed. Ensure you
replace the exact time.sleep(0.05) call so the test waits deterministically for
the stream to be ready rather than relying on a fixed delay.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb02ad9 and 8824bda.

📒 Files selected for processing (5)
  • crates/btlightning-py/tests/conftest.py
  • crates/btlightning-py/tests/test_errors.py
  • crates/btlightning-py/tests/test_multi_handler.py
  • crates/btlightning-py/tests/test_server.py
  • crates/btlightning-py/tests/test_streaming.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/btlightning-py/tests/conftest.py
  • crates/btlightning-py/tests/test_multi_handler.py

@HudsonGraeme HudsonGraeme merged commit d8bd9e6 into main Feb 26, 2026
16 checks passed
@HudsonGraeme HudsonGraeme deleted the introduce/benchmark-ci-comparison branch February 26, 2026 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant