-
Notifications
You must be signed in to change notification settings - Fork 966
Flaky fixes #8778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
rustyrussell
wants to merge
16
commits into
ElementsProject:master
Choose a base branch
from
rustyrussell:guilt/flaky-fixes
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Flaky fixes #8778
rustyrussell
wants to merge
16
commits into
ElementsProject:master
from
rustyrussell:guilt/flaky-fixes
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is covering up real bugs; we need to fix CI instead. From https://pypi.org/project/pytest-rerunfailures/#re-run-all-failures: ``` To re-run all test failures, use the --reruns command line option with the maximum number of times you’d like the tests to run: $ pytest --reruns 5 ``` Signed-off-by: Rusty Russell <[email protected]>
This shows up as a flake in test_route_by_old_scid:
```
# Now restart l2, make sure it remembers the original!
l2.restart()
> l2.rpc.connect(l1.info['id'], 'localhost', l1.port)
tests/test_splicing.py:554:
...
> raise RpcError(method, payload, resp['error'])
E pyln.client.lightning.RpcError: RPC call failed: method: connect, payload: {'id': '0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518', 'host': 'localhost', 'port': 33837}, error: {'code': 400, 'message': 'Unable to connect, no address known for peer'}
```
This is because it's already (auto)connecting, and fails. This
failure is reported, before we've added the new address (once we add
the new address, we connect fine, but it's too late!):
```
lightningd-2 2025-12-08T02:39:18.241Z DEBUG gossipd: REPLY WIRE_GOSSIPD_NEW_BLOCKHEIGHT_REPLY with 0 fds
lightningd-2 2025-12-08T02:39:18.320Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: Initializing important peer with 0 addresses
lightningd-2 2025-12-08T02:39:18.320Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: Failed connected out: Unable to connect, no address known for peer
lightningd-2 2025-12-08T02:39:18.344Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: Will try reconnect in 1 seconds
lightningd-2 2025-12-08T02:39:18.344Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-connectd: Initializing important peer with 1 addresses
lightningd-2 2025-12-08T02:39:18.344Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-connectd: Connected out, starting crypto
lightningd-2 2025-12-08T02:39:18.344Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: Adding 1 addresses to important peer
lightningd-2 2025-12-08T02:39:18.345Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: Connected out, starting crypto
{'run_id': 256236335046680576, 'github_repository': 'ElementsProject/lightning', 'github_sha': '555f1ac461d151064aad6fc83b94a0290e2e9d5d', 'github_ref': 'refs/pull/8767/merge', 'github_ref_name': 'HEAD', 'github_run_id': 20013689862, 'github_head_ref': 'fixup-backfill-bug', 'github_run_number': 14774, 'github_base_ref': 'master', 'github_run_attempt': '1', 'testname': 'test_route_by_old_scid', 'start_time': 1765161493, 'end_time': 1765161558, 'outcome': 'fail'}
lightningd-2 2025-12-08T02:39:18.421Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-hsmd: Got WIRE_HSMD_ECDH_REQ
lightningd-2 2025-12-08T02:39:18.421Z DEBUG hsmd: Client: Received message 1 from client
lightningd-2 2025-12-08T02:39:18.453Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-hsmd: Got WIRE_HSMD_ECDH_REQ
lightningd-2 2025-12-08T02:39:18.453Z DEBUG hsmd: Client: Received message 1 from client
--------------------------- Captured stdout teardown ---------------------------
```
We can still get a warning: lightningd-1 2025-12-10T01:11:07.232Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: Received WIRE_WARNING: WARNING: channel_announcement: no unspent txout 109x1x1 This has nothing to do with l1 talking about the original channel (which would be 103x1x): it's because l2's gossipd (being the node which does the splice) immediately forgets the pre-splice id. If l1 sends some gossip, it will get a warning message. Signed-off-by: Rusty Russell <[email protected]>
Since this was written, we now test if remote side would get into this situation and stop it from happening, so the test doesn't work any more. Signed-off-by: Rusty Russell <[email protected]>
1. It was flaky, probably because it didn't wait for the remote update_channel. 2. Rusty applied a fix in 5f664da, not clear if it worked. 3. Christian disabled it altogether in 23ce9a9. Signed-off-by: Rusty Russell <[email protected]>
``` 2025-12-10T02:51:06.2435409Z [gw1] [ 77%] ERROR tests/test_plugin.py::test_commando ...lightningd-1: had BROKEN messages ... 2025-12-10T03:00:26.0440311Z lightningd-1 2025-12-10T02:51:01.548Z UNUSUAL jsonrpc#69: That's weird: Request checkrune took 5961 milliseconds ``` Signed-off-by: Rusty Russell <[email protected]>
signpsbt could be the one which takes a long time, so allow any psbt event. Signed-off-by: Rusty Russell <[email protected]>
…essages. Signed-off-by: Rusty Russell <[email protected]>
fe271a7 to
dd23663
Compare
We restart the nodeL if the coin_movements.py plugin hasn't processed the notification yet, it will be incorrect: ``` > assert account_balance(l2, chanid_1) == 100001001 E AssertionError: assert 150_001_001msat == 100_001_001 E + where 150001001msat = account_balance(<fixtures.LightningNode object at 0x7f0634e1eb00>, '39ac52c818c5304cf0664940ff236c4e3f8f4ceb8993cb1491347142d61b62bc') ``` Signed-off-by: Rusty Russell <[email protected]>
We didn't log when anchor transactions had short signatures,
which causes this test to not assert (did_short_sig):
```
total_feerate_perkw = total_fees / total_weight * 1000
> check_feerate([l3, l2], total_feerate_perkw, feerate)
tests/test_closing.py:4063:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
nodes = [<fixtures.LightningNode object at 0x7f0fb322bb20>, <fixtures.LightningNode object at 0x7f0fb1b5ead0>]
actual_feerate = 14006.105538595726, expected_feerate = 14000
def check_feerate(nodes, actual_feerate, expected_feerate):
# Feerate can't be lower.
assert actual_feerate > expected_feerate - 2
if actual_feerate >= expected_feerate + 2:
if any([did_short_sig(n) for n in nodes]):
return
# Use assert as it shows the actual values on failure
> assert actual_feerate < expected_feerate + 2
E AssertionError
```
Signed-off-by: Rusty Russell <[email protected]>
We don't want it to think that it can use both pre-splice and post-splice channels! Signed-off-by: Rusty Russell <[email protected]>
This means that we won't complain to peers which gossip about our channels, but it does mean that our channel graph (like other nodes on the network) will show two channels, not one, for the duration. For this reason, we need askrene to omit local dying channels. Signed-off-by: Rusty Russell <[email protected]>
We removed the functionality, but only disabled the tests. Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
dd23663 to
50a04af
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is based on #8767 (which was causing one of the "flakes"!).
This is a draft because the final commit disables all flakes so I can catch them (I'm also looping it on my laptop), so I can root cause the others. We may still mark some flaky if they are genuinely unreliable, but we need to make that determination!
Changelog-None