Skip to content

Prepare LN for large networks #737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

pinheadmz
Copy link
Contributor

  1. Move all the try/catch/retry logic from ln.py to ln_init.py. Meaning the API calls have no error handling on their own and propagate back up to the caller. This is to ensure consistency in the ln_init scenario which is opinionated about what is allowed to fail. Ideally the scenario completes successfully, terminates with an error, or runs endlessly because something is broken. In all those cases, ln_init needs to directly handle all API calls to LN nodes.
  2. Refactor ln_init for large networks (specifically test/data/LN_100.json with >500 channels) ensuring LN nodes always have enough funds to open all their channels, that channel opens are always tx output 0 and channel open TXs are ordered as expected in their blocks.

@pinheadmz pinheadmz marked this pull request as ready for review August 13, 2025 16:29
@pinheadmz
Copy link
Contributor Author

@macgyver13 I touched a lot of your LN API code, lemme know if you see anything smelly or frustrating in here,,,

@pinheadmz
Copy link
Contributor Author

Tested with 10, 50, 100 node networks + simln: https://github.com/pinheadmz/ln100-warnet

On digital ocean cluster: 100 vCPUs / 200 GB Memory / 1.22 TB Disk

@macgyver13
Copy link
Contributor

@macgyver13 I touched a lot of your LN API code, lemme know if you see anything smelly or frustrating in here,,,

Concept ACK - I like the separation of concerns, nice work - much needed clean up 👏
I am a little surprised the frequent use of reset_connection in CLN and LND - was that required for stability?

I will test later this week and even consider testing eclair with these changes to see if an underlying issue was causing instability with eclair nodes in warnet.

There are some nice improvements in ln.py in the eclair PR that may make sense to follow this PR if you see value - around the "use_rpc" concept.

@pinheadmz
Copy link
Contributor Author

pinheadmz commented Aug 13, 2025

reset_connection

Well, the way it was before we only reset connection if there was a problem and my new idea is, ln.py doesn't deal with problems at all. I also had to deal with this already in scenarios based on the functional test framework:

# Ensure that all RPC calls are made with brand new http connections
def auth_proxy_request(self, method, path, postdata):
self._set_conn() # creates new http client connection
return self.oldrequest(method, path, postdata)
AuthServiceProxy.oldrequest = AuthServiceProxy._request
AuthServiceProxy._request = auth_proxy_request

Unlike the test framework, we can't expect an HTTP connection to survive the entire lifespan of a warnet, so we refresh the connection on every call.

@@ -204,6 +204,8 @@ def check_logging_required(directory: Path):
return True
if default_file.get("metricsExport", False):
return True
if default_file.get("lnd", False).get("metricsExport"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to use {} for the default value
if default_file.get("lnd", {}).get("metricsExport"):

@@ -216,6 +218,8 @@ def check_logging_required(directory: Path):
return True
if node.get("metricsExport", False):
return True
if node.get("lnd", False).get("metricsExport"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants