Fix App Crashes and Lightning Node Recovery When Starting Offline #363

jvsena42 · 2025-09-09T16:53:48Z

Description

The app was crashing when started without internet connection due to unhandled network exceptions. Additionally, when users started the app offline and later connected to the internet, the Lightning node would get stuck in "Starting" state and never recover, requiring a manual app restart.

Crash Source: Network requests to api1.blocktank.to and other services were failing with UnknownHostException and propagating as fatal exceptions to the main thread
Recovery Issue: Lightning node setup failures left the node stuck in "Starting" state without a proper error status update, preventing automatic recovery when connectivity was restored

Key changes:

Network failures no longer propagate as fatal exceptions to the main thread
Safe defaults ensure app continues functioning with cached data
Improved exception handling across all network operations
Intelligent retry logic that distinguishes between retryable and permanent failures
Currency rates: Falls back to cached data
Geo-blocking: Defaults to "not blocked" if check fails

Preview

Screen_recording_20250911_084013.mp4

QA Notes

Phone offline -> open app -> should not crash -> phone online -> should sync and update status successfully
Phone online -> open app -> should display successfull status -> phone offline -> should update failure status -> online again -> resync and update status again -> send bitcoin with success
Phone offline > swype to refresh > display a device offline error

… network related failures

app/src/main/java/to/bitkit/async/ServiceQueue.kt

app/src/main/java/to/bitkit/repositories/LightningRepo.kt

app/src/main/java/to/bitkit/services/CoreService.kt

app/src/main/java/to/bitkit/services/LightningService.kt

jvsena42 · 2025-09-10T12:40:12Z

changed to draft to fix @settings_10 - Can enter wrong Electrum server and get an error message E2E test

ovitrif

Ran both test cases successfully with a caveat on test case 2

Phone online -> open app -> should display successfull status -> phone offline -> should update failure status -> online again -> resync and update status again -> send bitcoin with success

ended up with logs spam after restoring internet connection then trying swipe-to-refresh (all coming from lightingRepo.sync()):

ERROR❌️: ServiceQueue.LDK error [AppError='TxSyncTimeout='Syncing transactions timed out.'']

The error also shows up as toast after each pull-to-refresh:

But otherwise send and receive works both for onchain and LN, with one problem for LN send encountered only 1 time: estimateRoutingFeesForAmount takes a lot of waiting time right after tapping Continue on send amount screen, freezing the send flow for a while.

Remarks

Suggesting to keep retrying in a loop, instead of stopping after 5 attempts.

If I wait long enough during test case 1, nothing works anymore even if I get back online.

app/src/main/java/to/bitkit/services/LightningService.kt

jvsena42 · 2025-09-10T17:17:18Z

Set as draft to apply changes and retest

ovitrif · 2025-09-10T17:59:34Z

After a long time now I'm getting different error on each pull-to-refresh, in the same session where I used to get the TxSyncTimeout iirc (left it open until I came back after a break):

This also comes from lightingRepo.sync()

jvsena42 · 2025-09-10T18:10:17Z

After a long time now I'm getting different error on each pull-to-refresh, in the same session where I used to get the TxSyncTimeout iirc (left it open until I came back after a break):
This also comes from `lightingRepo.sync()`

After reach the max attempts, the app stops trying to start the node and refresh wouldn't work.
I'll improve this part resetting the attempts number on refresh, and try to set up again if not set up yet

jvsena42 · 2025-09-11T11:16:22Z

Ran both test cases successfully with a caveat on test case 2

Phone online -> open app -> should display successfull status -> phone offline -> should update failure status -> online again -> resync and update status again -> send bitcoin with success

ended up with logs spam after restoring internet connection then trying swipe-to-refresh (all coming from lightingRepo.sync()):
ERROR❌️: ServiceQueue.LDK error [AppError='TxSyncTimeout='Syncing transactions timed out.'']
The error also shows up as toast after each pull-to-refresh:

But otherwise send and receive works both for onchain and LN, with one problem for LN send encountered only 1 time: estimateRoutingFeesForAmount takes a lot of waiting time right after tapping Continue on send amount screen, freezing the send flow for a while.

Remarks

Suggesting to keep retrying in a loop, instead of stopping after 5 attempts.

If I wait long enough during test case 1, nothing works anymore even if I get back online.

Improvements:

Now the onRefesh checks the connection before calling the sync methods 4b8d0bc
Removed max attempts 2ff0513

jvsena42 · 2025-09-11T11:36:39Z

The send and receive flows checks while offline can be addressed to another branch to keep this PR as specific as possible

jvsena42 · 2025-09-11T11:43:48Z

Tests updated

ovitrif

Added a few remarks about the code, will test next…

ovitrif · 2025-09-11T12:57:49Z

app/src/main/java/to/bitkit/repositories/LightningRepo.kt

+    /**
+     * Determines if an error is retryable based on its type and characteristics
+     */


nit: private methods should not have doc comments

also applies on 315-317 for calculateRetryDelayWithJitter

ovitrif · 2025-09-11T13:01:53Z

app/src/main/java/to/bitkit/repositories/LightningRepo.kt

        _lightningState.update { it.copy(nodeLifecycleState = NodeLifecycleState.Initializing) }
    }

+    @Suppress("TooGenericExceptionCaught")


Should've disabled the rule from lint config in detekt.yml but we can do it in another PR.

But… even better would've been to use runCatching instead.
Tbh this lint rule is like a nice guard to push for preferring runCatching over try/cach.

ovitrif · 2025-09-11T13:04:20Z

app/src/main/java/to/bitkit/repositories/LightningRepo.kt

+        try {
+            lightningService.connectToTrustedPeers()
+            Result.success(Unit)
+        } catch (e: NetworkException) {


doesn't make much sense checking here if it's a network exception. because connecting to peer is in itself a network op :)…

ovitrif · 2025-09-11T13:05:56Z

app/src/main/java/to/bitkit/services/LightningService.kt

+    /**
+     * Enhanced network error detection for LDK-specific errors
+     */


This is a bit too much AI-like, but nvm, at least cleanup the comments pls 🙏🏻

ovitrif · 2025-09-11T13:07:40Z

app/src/main/java/to/bitkit/services/LightningService.kt

+            lowerMessage.contains("unreachable") ||
+            lowerMessage.contains("refused") ||
+            // VSS-specific network errors
+            lowerMessage.contains("vss") ||


I don't think any ldk-node error has any VSS mention

ovitrif · 2025-09-11T13:12:43Z

app/src/main/java/to/bitkit/services/LightningService.kt

        ServiceQueue.LDK.background {
+            var networkFailures = 0
+            val maxNetworkFailures = trustedLnPeers.size // Allow all to fail due to network issues
+
            for (peer in trustedLnPeers) {
                try {
                    node.connect(peer.nodeId, peer.address, persist = true)
                    Logger.info("Connected to trusted peer: $peer")
                } catch (e: NodeException) {
-                    Logger.error("Peer connect error: $peer", LdkError(e))
+                    val ldkError = LdkError(e)
+                    val isNetworkError = isNetworkRelatedError(e.message)
+
+                    if (isNetworkError) {
+                        networkFailures++
+                        Logger.warn("Network error connecting to trusted peer: $peer", ldkError)
+
+                        // If all connections failed due to network, throw network exception
+                        if (networkFailures >= maxNetworkFailures) {
+                            throw NetworkException("Failed to connect to any trusted peers due to network issues")
+                        }
+                    } else {
+                        Logger.error("Peer connect error: $peer", ldkError)
+                    }
                }


The current networkFailures tracking adds complexity without clear benefit. What's the intended behavior when some (but not all) connections fail?

jvsena42 · 2025-09-11T13:33:26Z

Converted to draft to check comments

ovitrif

Tests

1️⃣ Start app when offline 🟢

Phone offline -> open app -> should not crash -> phone online -> should sync and update status successfully

2️⃣ Go offline after app started when online 🔴

Phone online -> open app -> should display successfull status -> phone offline -> should update failure status -> online again -> resync and update status again -> send bitcoin with success

App works very bad after restoring internet for me:

from second 19 till 1m:26s the loading spinner keeps going
then every pull-to-refresh still take a lot of time
- every pull-to-refresh ultimately fails with an error: TxSyncTimeout
each time I go to receive sheet, it takes a lot of time for the QR to show up

Android.Studio.2025-09-11.000337.mp4

actually the app never recovers, not even after restarting it

errAfterAppRestart.mp4

The only way to fix the issue is to close the app via the node notification, and then to wait a bit.

if I pull-to-refresh before node is ready, I get another non-fixable toast error on each refresh

errAfterNodeRestart.mp4

2️⃣ pull-to-refresh when offline 🟢

Phone offline > swype to refresh > display a device offline error

overall:
Honestly I'm still as concerned as yesterday about merging this PR, especially given we will have a testing session tomorrow.

Not sure if the test case 2️⃣ is better on master, but if yes, then I would suggest to retry this fix from scratch. I'm not very confident in the amount of code changes, makes too many core code paths too difficult to reason about without AI.

jvsena42 · 2025-09-11T13:49:20Z

Found a simpler solution for the crash. I'll close this PR and open another one

jvsena42 added 13 commits September 9, 2025 09:06

refactor: improve exception handling and return a NetworkException to…

b6042f9

… network related failures

refactor: improve retry process and toast display

066ab22

refactor: improve exception handling

2c0ec5b

refactor: improve network exception handling

8538f44

refactor: improve network exception handling

731acba

fix: improve retry logic

bf488f2

fix: improve retry logic

b6edfdf

fix: improve retry logic

d7665f3

fix: improve exception handling

d796226

chore: remove log

39a0910

refactor: centralize network error related handling

6b62c7e

refactor: centralize network error related handling

41654cb

fix: update node state on failures

1a8f0d2

jvsena42 self-assigned this Sep 9, 2025

github-advanced-security bot found potential problems Sep 9, 2025

View reviewed changes

Merge branch 'master' into fix/crash-offline-startup

3ed54fb

jvsena42 changed the title ~~fix: App crash if phone is ofline on startup~~ Fix App Crashes and Lightning Node Recovery When Starting Offline Sep 9, 2025

jvsena42 added 6 commits September 9, 2025 14:10

chore: change log to verbose

35c0594

chore: lint

63f5a47

Merge branch 'master' into fix/crash-offline-startup

cf0f01d

chore: lint

d82769a

chore: remove magic numbers

0d00b78

chore: remove unused argument

35f12d4

jvsena42 marked this pull request as ready for review September 10, 2025 11:08

jvsena42 requested a review from ovitrif September 10, 2025 11:29

jvsena42 marked this pull request as draft September 10, 2025 12:39

fix: disable retry for server changes

a820cdd

jvsena42 marked this pull request as ready for review September 10, 2025 13:28

chore: remove annotation

92ea097

ovitrif requested changes Sep 10, 2025

View reviewed changes

app/src/main/java/to/bitkit/services/LightningService.kt Outdated Show resolved Hide resolved

app/src/main/java/to/bitkit/services/LightningService.kt Outdated Show resolved Hide resolved

fix: remove NetworkException handling from configureGossipSource

1d7ed86

jvsena42 marked this pull request as draft September 10, 2025 17:17

fix: remove NetworkException handling from configureChainSource

03508d5

jvsena42 marked this pull request as ready for review September 10, 2025 18:00

jvsena42 marked this pull request as draft September 10, 2025 18:01

fix: add connectivity check to onRefresh

4b8d0bc

fix: remove max attempts

2ff0513

jvsena42 marked this pull request as ready for review September 11, 2025 11:43

chore: remove suppress

1205350

jvsena42 requested a review from ovitrif September 11, 2025 12:11

Merge branch 'master' into fix/crash-offline-startup

2a49148

ovitrif reviewed Sep 11, 2025

View reviewed changes

jvsena42 marked this pull request as draft September 11, 2025 13:31

ovitrif requested changes Sep 11, 2025

View reviewed changes

jvsena42 closed this Sep 11, 2025

jvsena42 mentioned this pull request Sep 11, 2025

Improve geoblock validation for devices connected to external nodes #371

Merged

7 tasks

jvsena42 deleted the fix/crash-offline-startup branch September 12, 2025 23:16

Fix App Crashes and Lightning Node Recovery When Starting Offline #363

Fix App Crashes and Lightning Node Recovery When Starting Offline #363

Uh oh!

Conversation

jvsena42 commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Preview

QA Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jvsena42 commented Sep 10, 2025

Uh oh!

ovitrif left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Remarks

Uh oh!

Uh oh!

Uh oh!

jvsena42 commented Sep 10, 2025

Uh oh!

ovitrif commented Sep 10, 2025

Uh oh!

jvsena42 commented Sep 10, 2025

Uh oh!

jvsena42 commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Remarks

Uh oh!

jvsena42 commented Sep 11, 2025

Uh oh!

jvsena42 commented Sep 11, 2025

Uh oh!

ovitrif left a comment

Choose a reason for hiding this comment

Uh oh!

ovitrif Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

ovitrif Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

ovitrif Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

ovitrif Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

ovitrif Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

ovitrif Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

jvsena42 commented Sep 11, 2025

Uh oh!

ovitrif left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Tests

1️⃣ Start app when offline 🟢

2️⃣ Go offline after app started when online 🔴

2️⃣ pull-to-refresh when offline 🟢

Uh oh!

jvsena42 commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

jvsena42 commented Sep 9, 2025 •

edited

Loading

ovitrif left a comment •

edited

Loading

jvsena42 commented Sep 11, 2025 •

edited

Loading

ovitrif left a comment •

edited

Loading