Introduce benchmark CI and Python binding test suite#45
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdds a GitHub Actions benchmark workflow, a Python test CI job, pytest configuration and fixtures, and a suite of new pytest modules exercising client/server behavior, error handling, streaming, roundtrips, and data-type roundtrips for the btlightning-py crate. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
dbd1fe0 to
2e0b5d8
Compare
81925c9 to
29d2893
Compare
29d2893 to
2e4d482
Compare
Benchmark ResultsConnection Setup (ms)
Latency (ms)
Throughput (req/s)
Wire Bytes
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/bench.yml:
- Around line 75-76: The ternary expression that sets the benchmark marker uses
a 15% threshold but should use 10% to match the stated tolerance—update both
occurrences of the literal 15 in the expression that compares pct (the ternary
using ? (pct > 15 ? ' !!!' : pct < -15 ? ' +++' : '') : (pct < -15 ? ' !!!' :
pct > 15 ? ' +++' : '')) to 10 so the regression/improvement markers reflect a
±10% tolerance while preserving the existing sign and ordering logic.
In `@crates/btlightning-py/tests/conftest.py`:
- Around line 27-33: The fixture uses a brittle fixed sleep and doesn't join the
serve thread; change it to wait for the server to be ready (poll a ready
predicate or attempt a short connect to free_port with a timeout loop) instead
of time.sleep(0.05) before yielding, and after calling server.stop() join the
thread (t.join(timeout)) to ensure the serve_forever thread exits and avoid
leakage; update references around server.start(), threading.Thread(...
target=server.serve_forever) / t.start(), the yield server, free_port, and
server.stop() to implement the readiness loop and t.join().
In `@crates/btlightning-py/tests/test_errors.py`:
- Around line 35-36: The test currently uses a blind pytest.raises(Exception)
for the query_axon error-path; replace that with the concrete exception classes
raised by the bindings (e.g., the specific Python error type constructed in the
Rust wrapper) for the "missing signer" and "invalid timeout" cases. Locate where
query_axon maps errors to Python exceptions (search for query_axon and the
PyErr/new_err/Py*Error creation in the bindings), then change the two
pytest.raises(Exception) calls in the test to
pytest.raises(<ConcreteExceptionClass>) using the exact exception types you find
and ensure the raised message assertion (if any) matches the expected text.
In `@crates/btlightning-py/tests/test_multi_handler.py`:
- Around line 22-41: The test is flaky because it relies on time.sleep and only
cleans up at the end; make startup/teardown deterministic and always run cleanup
by wrapping the setup and assertions in a try/finally so client.close(),
server.stop() and thread.join() are always called, and replace the fixed
time.sleep with a deterministic sync (e.g., wait on a server-started event or a
new server.wait_until_ready() method) after server.start() before starting
requests; update references in this test to use server.start(), the
serve_forever thread, client.initialize_connections(), client.query_axon(), and
in the finally block call client.close(), server.stop() and t.join() to avoid
leaked resources.
In `@crates/btlightning-py/tests/test_server.py`:
- Around line 35-48: Wrap the test setup and assertions for LightningServer in a
try/finally so the server teardown always runs: after creating server =
LightningServer(...), starting it (server.start()) and launching the
serve_forever thread (t = threading.Thread(...); t.start()), put the assertions
in the try block and call server.stop() in the finally block (and join the
thread t if needed) to ensure the server is stopped even if an assertion fails;
apply the same try/finally pattern to the other test block covering lines 51-63.
- Around line 39-42: The test uses a fixed time.sleep(0.05) after starting the
thread running server.serve_forever which is flaky; replace the sleep with an
explicit readiness check (either wait on a threading.Event the server sets when
ready or poll-connect to the server port) so the test only proceeds once the
server is actually accepting connections. Locate the thread start (t =
threading.Thread(target=server.serve_forever, daemon=True); t.start()) and
remove the time.sleep calls, then add a short loop that attempts to connect (or
waits on a server-ready Event exposed by the server) with a small timeout and
fail the test if readiness isn't achieved within a reasonable total timeout.
In `@crates/btlightning-py/tests/test_streaming.py`:
- Around line 28-38: Wrap the test logic that uses the Lightning instance and
server in a try/finally: after creating client = Lightning(...), calling
set_validator_keypair, initialize_connections and obtaining stream via
query_axon_stream, run the assertions inside the try block and move
client.close() and server.stop() into the finally block so they always run;
ensure any resources created by Lightning (client) and the test server are
cleaned up even if assertions fail.
- Around line 23-27: Replace the brittle fixed sleep after
server.start()/t.start() with a deterministic readiness poll: after calling
server.start() and starting the serve_forever thread, loop with a short sleep
(e.g. 5-50ms) up to a timeout and attempt to connect to the server (or check a
server-provided ready flag) until the connection succeeds, then proceed; replace
the time.sleep(0.05) with this polling loop (use server.serve_forever,
server.start, and the server's listen address/socket to probe) so the test waits
deterministically for the server to be ready instead of relying on a fixed
delay.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
.github/workflows/bench.yml.github/workflows/ci.yml.gitignorecrates/btlightning-py/pyproject.tomlcrates/btlightning-py/tests/conftest.pycrates/btlightning-py/tests/test_client.pycrates/btlightning-py/tests/test_errors.pycrates/btlightning-py/tests/test_multi_handler.pycrates/btlightning-py/tests/test_roundtrip.pycrates/btlightning-py/tests/test_server.pycrates/btlightning-py/tests/test_streaming.pycrates/btlightning-py/tests/test_types.py
| ? (pct > 15 ? ' !!!' : pct < -15 ? ' +++' : '') | ||
| : (pct < -15 ? ' !!!' : pct > 15 ? ' +++' : ''); |
There was a problem hiding this comment.
Regression/improvement threshold appears inconsistent with the stated 10% tolerance.
The benchmark marker logic is still using 15%, so comment annotations won’t match the intended tolerance.
🛠️ Proposed fix
- ? (pct > 15 ? ' !!!' : pct < -15 ? ' +++' : '')
- : (pct < -15 ? ' !!!' : pct > 15 ? ' +++' : '');
+ ? (pct > 10 ? ' !!!' : pct < -10 ? ' +++' : '')
+ : (pct < -10 ? ' !!!' : pct > 10 ? ' +++' : '');📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ? (pct > 15 ? ' !!!' : pct < -15 ? ' +++' : '') | |
| : (pct < -15 ? ' !!!' : pct > 15 ? ' +++' : ''); | |
| ? (pct > 10 ? ' !!!' : pct < -10 ? ' +++' : '') | |
| : (pct < -10 ? ' !!!' : pct > 10 ? ' +++' : ''); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/bench.yml around lines 75 - 76, The ternary expression
that sets the benchmark marker uses a 15% threshold but should use 10% to match
the stated tolerance—update both occurrences of the literal 15 in the expression
that compares pct (the ternary using ? (pct > 15 ? ' !!!' : pct < -15 ? ' +++' :
'') : (pct < -15 ? ' !!!' : pct > 15 ? ' +++' : '')) to 10 so the
regression/improvement markers reflect a ±10% tolerance while preserving the
existing sign and ordering logic.
| server.start() | ||
| t = threading.Thread(target=server.serve_forever, daemon=True) | ||
| t.start() | ||
| time.sleep(0.05) | ||
| yield server, free_port | ||
| server.stop() | ||
|
|
There was a problem hiding this comment.
Harden fixture lifecycle to reduce startup flakiness and thread leakage.
A fixed delay is brittle, and teardown should wait for the serve thread to exit after server.stop().
♻️ Proposed fixture hardening
def echo_server(free_port):
@@
server.register_synapse_handler("echo", lambda data: data)
server.start()
t = threading.Thread(target=server.serve_forever, daemon=True)
t.start()
- time.sleep(0.05)
+ deadline = time.time() + 2.0
+ while time.time() < deadline:
+ try:
+ with socket.create_connection(("127.0.0.1", free_port), timeout=0.1):
+ break
+ except OSError:
+ time.sleep(0.01)
+ else:
+ raise AssertionError("echo server did not become ready in time")
yield server, free_port
server.stop()
+ t.join(timeout=1.0)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/btlightning-py/tests/conftest.py` around lines 27 - 33, The fixture
uses a brittle fixed sleep and doesn't join the serve thread; change it to wait
for the server to be ready (poll a ready predicate or attempt a short connect to
free_port with a timeout loop) instead of time.sleep(0.05) before yielding, and
after calling server.stop() join the thread (t.join(timeout)) to ensure the
serve_forever thread exits and avoid leakage; update references around
server.start(), threading.Thread(... target=server.serve_forever) / t.start(),
the yield server, free_port, and server.stop() to implement the readiness loop
and t.join().
| t = threading.Thread(target=server.serve_forever, daemon=True) | ||
| t.start() | ||
| time.sleep(0.05) | ||
|
|
There was a problem hiding this comment.
Avoid fixed sleeps for server readiness in lifecycle tests.
Line 41 and Line 57 rely on time.sleep(0.05), which is timing-sensitive and can intermittently fail in busy CI environments.
Suggested fix pattern
- time.sleep(0.05)
+ deadline = time.time() + 2.0
+ while time.time() < deadline:
+ try:
+ with socket.create_connection(("127.0.0.1", port), timeout=0.05):
+ break
+ except OSError:
+ time.sleep(0.01)
+ else:
+ raise AssertionError("Server did not become ready in time")Also applies to: 55-58
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/btlightning-py/tests/test_server.py` around lines 39 - 42, The test
uses a fixed time.sleep(0.05) after starting the thread running
server.serve_forever which is flaky; replace the sleep with an explicit
readiness check (either wait on a threading.Event the server sets when ready or
poll-connect to the server port) so the test only proceeds once the server is
actually accepting connections. Locate the thread start (t =
threading.Thread(target=server.serve_forever, daemon=True); t.start()) and
remove the time.sleep calls, then add a short loop that attempts to connect (or
waits on a server-ready Event exposed by the server) with a small timeout and
fail the test if readiness isn't achieved within a reasonable total timeout.
| server.start() | ||
| t = threading.Thread(target=server.serve_forever, daemon=True) | ||
| t.start() | ||
| time.sleep(0.05) | ||
|
|
There was a problem hiding this comment.
Replace fixed startup sleep with deterministic readiness polling.
Line 26 uses a fixed time.sleep(0.05), which is flaky on shared CI runners and can produce intermittent failures.
Suggested fix
server.start()
t = threading.Thread(target=server.serve_forever, daemon=True)
t.start()
- time.sleep(0.05)
+ deadline = time.time() + 2.0
+ while time.time() < deadline:
+ try:
+ with socket.create_connection(("127.0.0.1", port), timeout=0.05):
+ break
+ except OSError:
+ time.sleep(0.01)
+ else:
+ raise AssertionError("Server did not become ready in time")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/btlightning-py/tests/test_streaming.py` around lines 23 - 27, Replace
the brittle fixed sleep after server.start()/t.start() with a deterministic
readiness poll: after calling server.start() and starting the serve_forever
thread, loop with a short sleep (e.g. 5-50ms) up to a timeout and attempt to
connect to the server (or check a server-provided ready flag) until the
connection succeeds, then proceed; replace the time.sleep(0.05) with this
polling loop (use server.serve_forever, server.start, and the server's listen
address/socket to probe) so the test waits deterministically for the server to
be ready instead of relying on a fixed delay.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
crates/btlightning-py/tests/test_streaming.py (1)
26-26:⚠️ Potential issue | 🟠 MajorUse readiness polling instead of a fixed sleep before streaming assertions.
Line 26 uses a fixed delay that can intermittently fail under CI load.
Suggested fix
- time.sleep(0.05) + deadline = time.time() + 2.0 + while time.time() < deadline: + try: + with socket.create_connection(("127.0.0.1", port), timeout=0.05): + break + except OSError: + time.sleep(0.01) + else: + raise AssertionError("Server did not become ready in time")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/btlightning-py/tests/test_streaming.py` at line 26, Replace the brittle fixed delay time.sleep(0.05) in tests/test_streaming.py with a readiness-polling loop: identify the readiness condition used by the test (e.g., a stream.ready() / connection.is_ready() method, presence of messages in the stream/queue, or a helper like wait_for_event), poll that condition with a short sleep interval until a timeout (e.g., 1–5s) and only then perform the streaming assertions; update the test to fail with a clear timeout message if readiness is not observed. Ensure you replace the exact time.sleep(0.05) call so the test waits deterministically for the stream to be ready rather than relying on a fixed delay.crates/btlightning-py/tests/test_server.py (1)
41-41:⚠️ Potential issue | 🟠 MajorReplace fixed startup sleeps with deterministic readiness polling.
Line 41 and Line 59 still rely on a fixed
time.sleep(0.05), which is timing-sensitive and can flake on slower CI runners.Suggested fix
+def _wait_until_listening(host: str, port: int, timeout_secs: float = 2.0) -> None: + deadline = time.time() + timeout_secs + while time.time() < deadline: + try: + with socket.create_connection((host, port), timeout=0.05): + return + except OSError: + time.sleep(0.01) + raise AssertionError(f"Server {host}:{port} did not become ready in time") + @@ - time.sleep(0.05) + _wait_until_listening("127.0.0.1", port) @@ - time.sleep(0.05) + _wait_until_listening("127.0.0.1", port)Also applies to: 59-59
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/btlightning-py/tests/test_server.py` at line 41, The test uses fixed time.sleep(0.05) in tests/test_server.py which is flaky; replace both occurrences with deterministic readiness polling: remove time.sleep(0.05) and implement a short loop that repeatedly attempts to contact the test server (e.g., TCP connect to the server port or an HTTP/health endpoint) with a small backoff and overall timeout (e.g., 2–5s), returning as soon as the connection succeeds, and failing the test if the timeout elapses; update the two places where time.sleep is called to use this polling helper (e.g., wait_for_server_ready or inline loop) so startup is robust on slow CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/btlightning-py/tests/test_errors.py`:
- Around line 32-37: The test test_query_without_signer should guarantee
Lightning cleanup: wrap the client creation and the pytest.raises block in a
try/finally so that client.close() is always called even if client.query_axon
doesn't raise; specifically, create the Lightning instance (client =
Lightning(...)), run the with pytest.raises(ConnectionError, match="endpoint not
initialized"): client.query_axon(...) inside try, and call client.close() in
finally to ensure cleanup rather than relying on __del__.
---
Duplicate comments:
In `@crates/btlightning-py/tests/test_server.py`:
- Line 41: The test uses fixed time.sleep(0.05) in tests/test_server.py which is
flaky; replace both occurrences with deterministic readiness polling: remove
time.sleep(0.05) and implement a short loop that repeatedly attempts to contact
the test server (e.g., TCP connect to the server port or an HTTP/health
endpoint) with a small backoff and overall timeout (e.g., 2–5s), returning as
soon as the connection succeeds, and failing the test if the timeout elapses;
update the two places where time.sleep is called to use this polling helper
(e.g., wait_for_server_ready or inline loop) so startup is robust on slow CI.
In `@crates/btlightning-py/tests/test_streaming.py`:
- Line 26: Replace the brittle fixed delay time.sleep(0.05) in
tests/test_streaming.py with a readiness-polling loop: identify the readiness
condition used by the test (e.g., a stream.ready() / connection.is_ready()
method, presence of messages in the stream/queue, or a helper like
wait_for_event), poll that condition with a short sleep interval until a timeout
(e.g., 1–5s) and only then perform the streaming assertions; update the test to
fail with a clear timeout message if readiness is not observed. Ensure you
replace the exact time.sleep(0.05) call so the test waits deterministically for
the stream to be ready rather than relying on a fixed delay.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
crates/btlightning-py/tests/conftest.pycrates/btlightning-py/tests/test_errors.pycrates/btlightning-py/tests/test_multi_handler.pycrates/btlightning-py/tests/test_server.pycrates/btlightning-py/tests/test_streaming.py
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/btlightning-py/tests/conftest.py
- crates/btlightning-py/tests/test_multi_handler.py
Summary
bench) benchmark CI workflow that buildslightning-benchfor both the PR and main branches, runs them back-to-back on the same runner, and posts a markdown comparison comment with percentage changes on the PRbtlightningPython API surface (client, server, roundtrip, streaming, type serialization, error paths, multi-handler routing) with a corresponding CI job usingmaturin develop --release.gitignoretest file exclusions to root/benchmarks directories instead of globally ignoringtest_*.pyTest plan
bench.ymlYAML parses correctly and job skips on PRs without thebenchlabelbenchlabel to a PR and confirm benchmark comparison comment is postedpython-testCI job passes:maturin develop --release && pytest tests/ -v --timeout=30cd crates/btlightning-py && maturin develop --release && pytest tests/ -v --timeout=30Summary by CodeRabbit
Tests
Chores