Skip to content

Conversation

@starius
Copy link
Contributor

@starius starius commented Nov 10, 2025

LND commit lightningnetwork/lnd@c6f458e (v0.20.0-rc3) moved ChainNotifier startup later in the lifecycle, so RegisterBlockEpochNtfn callers now see "chain notifier RPC is still in the process of starting" coming from Recv().

The new BlockUntilChainNotifier config option repeatedly calls RegisterBlockEpochNtfn during startup and only proceeds once the stream yields its first block height, retrying solely when we detect the ErrChainNotifierServerNotActive condition introduced by the LND commit above.

The wait is guarded by a "chainrpc" build-tag check, so deployments without the ChainNotifier service fail fast when the option is requested.

Pull Request Checklist

  • PR is opened against correct version branch.
  • Version compatibility matrix in the README and minimal required version
    in lnd_services.go are updated.
  • Update macaroon_recipes.go if your PR adds a new method that is called
    differently than the RPC method it invokes.

@starius starius force-pushed the block-until-chain-notifier-is-ready branch from e581dc3 to be4fd2c Compare November 10, 2025 20:43
starius added a commit to starius/loop that referenced this pull request Nov 10, 2025
starius added a commit to starius/loop that referenced this pull request Nov 10, 2025
starius added a commit to starius/loop that referenced this pull request Nov 10, 2025
starius added a commit to starius/loop that referenced this pull request Nov 10, 2025
Copy link

@mohamedawnallah mohamedawnallah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM starius/loop@26ee295!

LND commit lightningnetwork/lnd@c6f458e (v0.20.0-rc3) moved ChainNotifier startup later in the lifecycle

By the way, this commit (lightningnetwork/lnd@c6f458e) in LND v0.20.0-rc3 moved ChainNotifier startup back to its original location from LND v0.18.5. This location was changed in the first release candidate for LND v0.19.0, and this commit reverts it back to the v0.18.5 proper behavior


// If requested, wait until the chain notifier RPC is ready before we
// return. This ensures sub-servers relying on the notifier don't fail
// during startup.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

starius added a commit to starius/loop that referenced this pull request Nov 10, 2025
starius added a commit to starius/loop that referenced this pull request Nov 10, 2025
starius added a commit to starius/lightning-terminal that referenced this pull request Nov 10, 2025
Copy link

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting approach had some questions.

lnd_services.go Outdated
// return. This ensures sub-servers relying on the notifier don't fail
// during startup.
if cfg.BlockUntilChainNotifier {
if !hasBuildTag(services.Version, "chainrpc") {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this referring to the tags of lndclient or is this related to the tags sof LND which the client is connected to ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we really need to rely on build tags, will the waitForChainNotifier func not just error when there is no notifier ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this check. chainrpc is a requirement, according to lndclient/README.md

lnd_services.go Outdated
timeout, pollInterval time.Duration) error {

mainCtx := ctx
if mainCtx == nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't seen this pattern so far, why don't we trust the caller to provide a non-nil context ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it. AI glitch

lnd_services.go Outdated

for {
// Make new RegisterBlockEpochNtfn call.
subCtx, cancel := context.WithTimeout(mainCtx, timeout)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use:
defer cancel() it is way cleaner ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the aim here is to ensure the subCtx is cancelled before we enter a new for loop iteration in the polling case (after case err := <-errChan: is triggered). But since non of the variables returned from the register function call is no longer referenced, I think garbage collection should make that isn't really an issue we need to worry about right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked it to make a lambda function attempt which makes one attempt. It creates subCtx and cancels it in a defer statement.

lnd_services.go Outdated

// Wait for block height notification, which indicates success.
select {
case <-mainCtx.Done():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to wait for the subCtx here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think since a timeout of the subCtx won't throw an error that returns true for the isChainNotifierStartingErr, we might as return in the subCtx is done here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Changed to waiting for subCtx here.

If subCtr expires, it means, we haven't got the first block notification in 30s (default setting of cfg.RPCTimeout). RegisterBlockEpochNtfn must return the current block height immediately. So this is a real error.

lnd_services.go Outdated
// Wait for block height notification, which indicates success.
select {
case <-mainCtx.Done():
cancel()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed when we. use defer cancel() above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the call, used defer statement in attempt lambda.

}

// chainNotifierStartupMessage matches the error string returned by lnd
// v0.20.0-rc3+ when a ChainNotifier RPC is invoked before the sub-server

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this was not a problem before because pre 19 it was as it is now ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. This issue was only detected by LiT itest, not in Loop and anywhere else. My guess is that when pre-19 was around, there was no this LiT itest or something was different and it didn't fire.

@ziggie1984
Copy link

/gemini review

Copy link
Contributor

@ViktorT-11 ViktorT-11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good 🚀! Thanks for this 🙏! As @ziggie1984 has already commented, I think you can potentially clean up the context handling quite a bit :)

lnd_services.go Outdated

// Wait for block height notification, which indicates success.
select {
case <-mainCtx.Done():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think since a timeout of the subCtx won't throw an error that returns true for the isChainNotifierStartingErr, we might as return in the subCtx is done here.

lnd_services.go Outdated
if isChainNotifierStartingErr(err) {
select {
case <-time.After(pollInterval):
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without having tested this myself on the first iteration, I just want to make sure that something is logged here or during the register call. Just so that the user sees that there's still some progress and that the application isn't frozen (similar to how the waitForChainSync function triggers logging of every GetInfo poll attempt.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added logging. Each attempt and its outcome is logged with level INFO.

lnd_services.go Outdated

for {
// Make new RegisterBlockEpochNtfn call.
subCtx, cancel := context.WithTimeout(mainCtx, timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the aim here is to ensure the subCtx is cancelled before we enter a new for loop iteration in the polling case (after case err := <-errChan: is triggered). But since non of the variables returned from the register function call is no longer referenced, I think garbage collection should make that isn't really an issue we need to worry about right?

@starius starius force-pushed the block-until-chain-notifier-is-ready branch from be4fd2c to 60c2e6e Compare November 14, 2025 15:37
starius added a commit to starius/lightning-terminal that referenced this pull request Nov 14, 2025
@starius
Copy link
Contributor Author

starius commented Nov 14, 2025

I updated lightninglabs/lightning-terminal#1170 to use the most recent of this PR (60c2e6e) to check if it still fixes the itests.

@starius starius force-pushed the block-until-chain-notifier-is-ready branch from 60c2e6e to efe0aa0 Compare November 14, 2025 16:15
@starius
Copy link
Contributor Author

starius commented Nov 14, 2025

Added logging

Copy link
Contributor

@ViktorT-11 ViktorT-11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot 🔥, uTACK LGTM 🙏!

Copy link

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

LND commit c6f458e478f9 (v0.20.0-rc3) moved ChainNotifier startup later in
the lifecycle, so RegisterBlockEpochNtfn callers now see "chain notifier RPC
is still in the process of starting" coming from Recv().

The new BlockUntilChainNotifier config option repeatedly calls
RegisterBlockEpochNtfn during startup and only proceeds once the stream yields
its first block height, retrying solely when we detect the
ErrChainNotifierServerNotActive condition introduced by the LND commit above.
@starius starius force-pushed the block-until-chain-notifier-is-ready branch from 04a63fd to d4b697c Compare November 14, 2025 17:00
starius added a commit to starius/lightning-terminal that referenced this pull request Nov 14, 2025
@starius starius merged commit eaf0b75 into lightninglabs:lnd-20-0 Nov 14, 2025
1 check passed
@starius starius deleted the block-until-chain-notifier-is-ready branch November 14, 2025 17:40
@starius
Copy link
Contributor Author

starius commented Nov 14, 2025

Tagged as v0.20.0-6

starius added a commit to starius/lightning-terminal that referenced this pull request Nov 14, 2025
Include lightninglabs/lndclient#255
lndclient: block until chain notifier is ready
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants