Skip to content

Conversation

@jvsena42
Copy link
Member

@jvsena42 jvsena42 commented Sep 9, 2025

Description

The app was crashing when started without internet connection due to unhandled network exceptions. Additionally, when users started the app offline and later connected to the internet, the Lightning node would get stuck in "Starting" state and never recover, requiring a manual app restart.

  • Crash Source: Network requests to api1.blocktank.to and other services were failing with UnknownHostException and propagating as fatal exceptions to the main thread
  • Recovery Issue: Lightning node setup failures left the node stuck in "Starting" state without a proper error status update, preventing automatic recovery when connectivity was restored

Key changes:

  • Network failures no longer propagate as fatal exceptions to the main thread
  • Safe defaults ensure app continues functioning with cached data
  • Improved exception handling across all network operations
  • Intelligent retry logic that distinguishes between retryable and permanent failures
  • Currency rates: Falls back to cached data
  • Geo-blocking: Defaults to "not blocked" if check fails

Preview

Screen_recording_20250911_084013.mp4

QA Notes

  • Phone offline -> open app -> should not crash -> phone online -> should sync and update status successfully
  • Phone online -> open app -> should display successfull status -> phone offline -> should update failure status -> online again -> resync and update status again -> send bitcoin with success
  • Phone offline > swype to refresh > display a device offline error

@jvsena42 jvsena42 self-assigned this Sep 9, 2025
@jvsena42 jvsena42 changed the title fix: App crash if phone is ofline on startup Fix App Crashes and Lightning Node Recovery When Starting Offline Sep 9, 2025
@jvsena42 jvsena42 marked this pull request as ready for review September 10, 2025 11:08
@jvsena42 jvsena42 requested a review from ovitrif September 10, 2025 11:29
@jvsena42 jvsena42 marked this pull request as draft September 10, 2025 12:39
@jvsena42
Copy link
Member Author

changed to draft to fix @settings_10 - Can enter wrong Electrum server and get an error message E2E test

@jvsena42 jvsena42 marked this pull request as ready for review September 10, 2025 13:28
Copy link
Collaborator

@ovitrif ovitrif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran both test cases successfully with a caveat on test case 2

Phone online -> open app -> should display successfull status -> phone offline -> should update failure status -> online again -> resync and update status again -> send bitcoin with success

ended up with logs spam after restoring internet connection then trying swipe-to-refresh (all coming from lightingRepo.sync()):

ERROR❌️: ServiceQueue.LDK error [AppError='TxSyncTimeout='Syncing transactions timed out.'']

The error also shows up as toast after each pull-to-refresh:

But otherwise send and receive works both for onchain and LN, with one problem for LN send encountered only 1 time: estimateRoutingFeesForAmount takes a lot of waiting time right after tapping Continue on send amount screen, freezing the send flow for a while.


Remarks

  • Suggesting to keep retrying in a loop, instead of stopping after 5 attempts.

If I wait long enough during test case 1, nothing works anymore even if I get back online.

@jvsena42 jvsena42 marked this pull request as draft September 10, 2025 17:17
@jvsena42
Copy link
Member Author

Set as draft to apply changes and retest

@ovitrif
Copy link
Collaborator

ovitrif commented Sep 10, 2025

After a long time now I'm getting different error on each pull-to-refresh, in the same session where I used to get the TxSyncTimeout iirc (left it open until I came back after a break):

This also comes from lightingRepo.sync()

@jvsena42 jvsena42 marked this pull request as ready for review September 10, 2025 18:00
@jvsena42 jvsena42 marked this pull request as draft September 10, 2025 18:01
@jvsena42
Copy link
Member Author

After a long time now I'm getting different error on each pull-to-refresh, in the same session where I used to get the TxSyncTimeout iirc (left it open until I came back after a break):

This also comes from `lightingRepo.sync()`

After reach the max attempts, the app stops trying to start the node and refresh wouldn't work.
I'll improve this part resetting the attempts number on refresh, and try to set up again if not set up yet

@jvsena42
Copy link
Member Author

jvsena42 commented Sep 11, 2025

Ran both test cases successfully with a caveat on test case 2

Phone online -> open app -> should display successfull status -> phone offline -> should update failure status -> online again -> resync and update status again -> send bitcoin with success

ended up with logs spam after restoring internet connection then trying swipe-to-refresh (all coming from lightingRepo.sync()):

ERROR❌️: ServiceQueue.LDK error [AppError='TxSyncTimeout='Syncing transactions timed out.'']

The error also shows up as toast after each pull-to-refresh:

But otherwise send and receive works both for onchain and LN, with one problem for LN send encountered only 1 time: estimateRoutingFeesForAmount takes a lot of waiting time right after tapping Continue on send amount screen, freezing the send flow for a while.

Remarks

  • Suggesting to keep retrying in a loop, instead of stopping after 5 attempts.

If I wait long enough during test case 1, nothing works anymore even if I get back online.

Improvements:

  • Now the onRefesh checks the connection before calling the sync methods 4b8d0bc
  • Removed max attempts 2ff0513

@jvsena42
Copy link
Member Author

The send and receive flows checks while offline can be addressed to another branch to keep this PR as specific as possible

@jvsena42
Copy link
Member Author

Tests updated

@jvsena42 jvsena42 marked this pull request as ready for review September 11, 2025 11:43
@jvsena42 jvsena42 requested a review from ovitrif September 11, 2025 12:11
Copy link
Collaborator

@ovitrif ovitrif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few remarks about the code, will test next…

Comment on lines +299 to +301
/**
* Determines if an error is retryable based on its type and characteristics
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: private methods should not have doc comments

also applies on 315-317 for calculateRetryDelayWithJitter

_lightningState.update { it.copy(nodeLifecycleState = NodeLifecycleState.Initializing) }
}

@Suppress("TooGenericExceptionCaught")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should've disabled the rule from lint config in detekt.yml but we can do it in another PR.

But… even better would've been to use runCatching instead.
Tbh this lint rule is like a nice guard to push for preferring runCatching over try/cach.

try {
lightningService.connectToTrustedPeers()
Result.success(Unit)
} catch (e: NetworkException) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't make much sense checking here if it's a network exception. because connecting to peer is in itself a network op :)…

Comment on lines +152 to +154
/**
* Enhanced network error detection for LDK-specific errors
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit too much AI-like, but nvm, at least cleanup the comments pls 🙏🏻

lowerMessage.contains("unreachable") ||
lowerMessage.contains("refused") ||
// VSS-specific network errors
lowerMessage.contains("vss") ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any ldk-node error has any VSS mention

Comment on lines 319 to 342
ServiceQueue.LDK.background {
var networkFailures = 0
val maxNetworkFailures = trustedLnPeers.size // Allow all to fail due to network issues

for (peer in trustedLnPeers) {
try {
node.connect(peer.nodeId, peer.address, persist = true)
Logger.info("Connected to trusted peer: $peer")
} catch (e: NodeException) {
Logger.error("Peer connect error: $peer", LdkError(e))
val ldkError = LdkError(e)
val isNetworkError = isNetworkRelatedError(e.message)

if (isNetworkError) {
networkFailures++
Logger.warn("Network error connecting to trusted peer: $peer", ldkError)

// If all connections failed due to network, throw network exception
if (networkFailures >= maxNetworkFailures) {
throw NetworkException("Failed to connect to any trusted peers due to network issues")
}
} else {
Logger.error("Peer connect error: $peer", ldkError)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current networkFailures tracking adds complexity without clear benefit. What's the intended behavior when some (but not all) connections fail?

@jvsena42 jvsena42 marked this pull request as draft September 11, 2025 13:31
@jvsena42
Copy link
Member Author

Converted to draft to check comments

Copy link
Collaborator

@ovitrif ovitrif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests

1️⃣ Start app when offline 🟢

Phone offline -> open app -> should not crash -> phone online -> should sync and update status successfully

2️⃣ Go offline after app started when online 🔴

Phone online -> open app -> should display successfull status -> phone offline -> should update failure status -> online again -> resync and update status again -> send bitcoin with success

App works very bad after restoring internet for me:

  • from second 19 till 1m:26s the loading spinner keeps going
  • then every pull-to-refresh still take a lot of time
    • every pull-to-refresh ultimately fails with an error: TxSyncTimeout
  • each time I go to receive sheet, it takes a lot of time for the QR to show up
Android.Studio.2025-09-11.000337.mp4
actually the app never recovers, not even after restarting it
errAfterAppRestart.mp4

The only way to fix the issue is to close the app via the node notification, and then to wait a bit.

if I pull-to-refresh before node is ready, I get another non-fixable toast error on each refresh
errAfterNodeRestart.mp4

2️⃣ pull-to-refresh when offline 🟢

  1. Phone offline > swype to refresh > display a device offline error

overall:
Honestly I'm still as concerned as yesterday about merging this PR, especially given we will have a testing session tomorrow.

Not sure if the test case 2️⃣ is better on master, but if yes, then I would suggest to retry this fix from scratch. I'm not very confident in the amount of code changes, makes too many core code paths too difficult to reason about without AI.

@jvsena42
Copy link
Member Author

Found a simpler solution for the crash. I'll close this PR and open another one

@jvsena42 jvsena42 closed this Sep 11, 2025
@jvsena42 jvsena42 deleted the fix/crash-offline-startup branch September 12, 2025 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants