Skip to content

Commit 438c8ab

Browse files
committed
doc: describe default recovery scenario
1 parent eac127d commit 438c8ab

File tree

4 files changed

+192
-6
lines changed

4 files changed

+192
-6
lines changed

README.md

Lines changed: 153 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
## Index
44

55
* [Installation](#installation)
6-
* [Overview](#overview)
6+
* [Command overview](#command-overview)
77
* [Commands](#commands)
88
+ [chanbackup](#chanbackup)
99
+ [compactdb](#compactdb)
@@ -21,27 +21,175 @@
2121
+ [walletinfo](#walletinfo)
2222

2323
This tool provides helper functions that can be used to rescue funds locked in
24-
`lnd` channels in case `lnd` itself cannot run properly any more.
24+
`lnd` channels in case `lnd` itself cannot run properly anymore.
2525

2626
**WARNING**: This tool was specifically built for a certain rescue operation and
2727
might not be well-suited for your use case. Or not all edge cases for your needs
2828
are coded properly. Please look at the code to understand what it does before
2929
you use it for anything serious.
3030

31-
**WARNING 2**: This tool will query public block explorer APIs for some of the
31+
**WARNING 2**: This tool will query public block explorer APIs for some
3232
commands, your privacy might not be preserved. Use at your own risk or supply
3333
a private API URL with `--apiurl`.
3434

3535
## Installation
3636

3737
To install this tool, make sure you have `go 1.13.x` (or later) and `make`
38-
installed and run the following command:
38+
installed and run the following commands:
3939

4040
```bash
41+
git clone https://github.com/guggero/chantools.git
42+
cd chantools
4143
make install
4244
```
4345

44-
## Overview
46+
## Channel recovery scenario
47+
48+
The following flow chart shows the main recovery scenario this tool was built
49+
for. This scenario assumes that you do have access to the crashed node's seed,
50+
`channel.backup` file and some state of a `channel.db` file (perhaps from a
51+
file based backup or the recovered file from the crashed node).
52+
53+
![rescue flow](doc/rescue-flow.png)
54+
55+
**Explanation:**
56+
57+
1. **Node crashed**: For some reason your `lnd` node crashed and isn't starting
58+
anymore. If you get errors similar to
59+
[this](https://github.com/lightningnetwork/lnd/issues/4449),
60+
[this](https://github.com/lightningnetwork/lnd/issues/3473) or
61+
[this](https://github.com/lightningnetwork/lnd/issues/4102), it is possible
62+
that a simple compaction (a full copy in safe mode) can solve your problem.
63+
See [`chantools compactdb`](#compactdb).
64+
<br/><br/>
65+
If that doesn't work and you need to continue the recovery, make sure you can
66+
at least extract the `channel.backup` file and if somehow possible any version
67+
of the `channel.db` from the node.
68+
<br/><br/>
69+
Whatever you do, do **never, ever** replace your `channel.db` file with an old
70+
version (from a file based backup) and start your node that way. [Read this
71+
explanation why that can lead to loss of funds.](https://github.com/lightningnetwork/lnd/blob/master/docs/safety.md#file-based-backups)
72+
73+
2. **Rescue on-chain balance**: To start the recovery process, we are going to
74+
re-create the node from scratch. To make sure we don't overwrite any old data
75+
in the process, make sure the old data directory of your node (usually `.lnd`
76+
in the user's home directory) is safely moved away (or the whole folder
77+
renamed) before continuing.
78+
<br/>
79+
To start the on-chain recovery, [follow the sub step "Starting On-Chain
80+
Recovery" of this guide](https://github.com/lightningnetwork/lnd/blob/master/docs/recovery.md#starting-on-chain-recovery).
81+
Don't follow the whole guide, only this single chapter!
82+
<br/><br/>
83+
This step is completed once the `lncli getinfo` command shows both
84+
`"synced_to_chain": true` and `"synced_to_graph": true` which can take several
85+
hours depending on the speed of your hardware. **Do not be alarmed** that the
86+
`lncli getinfo` command shows 0 channels. This is normal as we haven't started
87+
the off-chain recovery yet.
88+
89+
3. **Recover channels using SCB**: Now that the node is fully synced, we can try
90+
to recover the channels using the [Static Channel Backups (SCB)](https://github.com/lightningnetwork/lnd/blob/master/docs/safety.md#static-channel-backups-scbs).
91+
For this, you need a file called `channel.backup`. Simply run the command
92+
`lncli restorechanbackup --multi_file <path-to-your-channel.backup>`. **This
93+
will take a while!**. The command itself can take several minutes to complete,
94+
depending on the number of channels. The recovery can easily take a day or
95+
two as a lot of chain rescanning needs to happen. It is recommended to wait at
96+
least one full day. You can watch the progress with the `lncli pendingchannels`
97+
command. If the list is empty, congratulations, you've recovered all channels!
98+
If the list stays un-changed for several hours, it means not all channels
99+
could be restored using this method.
100+
[One explanation can be found here.](https://github.com/lightningnetwork/lnd/blob/master/docs/safety.md#zombie-channels)
101+
102+
4. **Install chantools**: To try to recover the remaining channels, we are going
103+
to use `chantools`. Simply [follow the installation instructions.](#installation)
104+
The recovery can only be continued if you have access to some version of the
105+
crashed node's `channel.db`. This could be the latest state as recovered from
106+
the crashed file system, or a version from a regular file based backup. If you
107+
do not have any version of a channel DB, `chantools` won't be able to help
108+
with the recovery. See step 11 for some possible manual steps.
109+
110+
5. **Create copy of channel DB**: To make sure we can read the channel DB, we
111+
are going to create a copy in safe mode (called compaction). Simply run
112+
<br/><br/>
113+
`chantools compactdb --sourcedb <recovered-channel.db> --destdb ./results/compacted.db`
114+
<br/><br/>
115+
We are going to assume that the compacted copy of the channel DB is located in
116+
`./results/compacted.db` in the following commands.
117+
118+
6. **chantools summary**: First, `chantools` needs to find out the state of each
119+
channel on chain. For this, a blockchain API (by default [blockstream.info](https://blockstream.info))
120+
is queried. The result will be written to a file called
121+
`./results/summary-yyyy-mm-dd.json`. This result file will be needed for the
122+
next command.
123+
<br/><br/>
124+
`chantools --fromchanneldb ./results/compacted.db summary`
125+
126+
7. **chantools rescueclosed**: It is possible that by now the remote peers have
127+
force-closed some of the remaining channels. What we now do is try to find the
128+
private keys to sweep our balance of those channels. For this we need a shared
129+
secret which is called the `commit_point` and is changed whenever a channel is
130+
updated. We do have the latest known version of this point in the channel DB.
131+
The following command tries to find all private keys for channels that have
132+
been closed by the other party. The command needs to know what channels it is
133+
operating on, so we have to supply the `summary-yyy-mm-dd.json` created by the
134+
previous command:
135+
<br/><br/>
136+
`chantools --fromsummary ./results/<summary-file-created-in-last-step>.json rescueclosed --channeldb ./results/compacted.db`
137+
<br/><br/>
138+
This will create a new file called `./results/rescueclosed-yyyy-mm-dd.json`
139+
which will contain any found private keys and will also be needed for the next
140+
command. Use `bitcoind` or Electrum Wallet to sweep all of the private keys.
141+
142+
8. **chantools forceclose**: This command will now close all channels that
143+
`chantools` thinks are still open. This is achieved by publishing the latest
144+
known channel state of the `channel.db` file.
145+
<br/>**Please read the full warning text of the
146+
[`forceclose` command below](#forceclose) as this command can put
147+
your funds at risk** if the state in the channel DB is not the most recent
148+
one. This command should only be executed for channels where the remote peer
149+
is not online anymore.
150+
<br/><br/>
151+
`chantools --fromsummary ./results/<rescueclosed-file-created-in-last-step>.json forceclose --channeldb ./results/compacted.db --publish`
152+
<br/><br/>
153+
This will create a new file called `./results/forceclose-yyyy-mm-dd.json`
154+
which will be needed for the next command.
155+
156+
9. **Wait for timelocks**: The previous command closed the remaining open
157+
channels by publishing your node's state of the channel. By design of the
158+
Lightning Network, you now have to wait until the channel funds belonging to
159+
you are not time locked any longer. Depending on the size of the channel, you
160+
have to wait for somewhere between 144 and 2000 confirmations of the
161+
force-close transactions. Only continue with the next step after the channel
162+
with the highest `csv_timeout` has reached that many confirmations of its
163+
closing transaction.
164+
165+
10. **chantools sweeptimelock**: Once all force-close transactions have reached
166+
the number of transactions as the `csv_timeout` in the JSON demands, these
167+
time locked funds can now be swept. Use the following command to sweep all the
168+
channel funds to an address of your wallet:
169+
<br/><br/>
170+
`chantools --fromsummary ./results/<forceclose-file-created-in-last-step>.json sweeptimelock --publish --sweepaddr <bech32-address-from-your-wallet>`
171+
172+
11. **Manual intervention necessary**: You got to this step because you either
173+
don't have a `channel.db` file or because `chantools` couldn't rescue all your
174+
node's channels. There are a few things you can try manually that have some
175+
chance of working:
176+
- Make sure you can connect to all nodes when restoring from SCB: It happens
177+
all the time that nodes change their IP addresses. When restoring from a
178+
static channel backup, your node tries to connect to the node using the IP
179+
address encoded in the backup file. If the address changed, the SCB restore
180+
process doesn't work. You can use block explorers like [1ml.com](https://1ml.com)
181+
to try to find an IP address that is up-to-date. Just run
182+
`lncli connect <node-pubkey>@<updated-ip-address>:<port>` in the recovered
183+
`lnd` node from step 3 and wait a few hours to see if the channel is now
184+
being force closed by the remote node.
185+
- Find out who the node belongs to: Maybe you opened the channel with someone
186+
you know. Or maybe their node alias contains some information about who the
187+
node belongs to. If you can find out who operates the remote node, you can
188+
ask them to force-close the channel from your end. If the channel was opened
189+
with the `option_static_remote_key`, (`lnd v0.8.0` and later), the funds can
190+
be swept by your node.
191+
192+
## Command overview
45193

46194
```text
47195
Usage:

cmd/chantools/rescueclosed.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ func rescueClosedChannels(extendedKey *hdkeychain.ExtendedKey,
147147
if err != nil {
148148
return err
149149
}
150-
fileName := fmt.Sprintf("results/bruteforce-%s.json",
150+
fileName := fmt.Sprintf("results/rescueclosed-%s.json",
151151
time.Now().Format("2006-01-02-15-04-05"))
152152
log.Infof("Writing result to %s", fileName)
153153
return ioutil.WriteFile(fileName, summaryBytes, 0644)

doc/rescue-flow.plantuml

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
@startuml
2+
3+
(*) --> "<b>1:</b> Node crashed"
4+
--> "<b>2:</b> Rescue on-chain balance"
5+
--> "<b>3:</b> Recover channels using SCB"
6+
if "Pending/Open\nchannels left?" then
7+
-->[yes] "<b>4:</b> Install chantools"
8+
if "<b>5:</b> Is channel DB \navailable?" then
9+
-->[yes] "<b>5:</b> Create copy of channel DB"
10+
--> "<b>6:</b> chantools summary"
11+
--> "<b>7:</b> chantools rescueclosed"
12+
if "Pending/Open\nchannels left?" then
13+
-->[yes] "<b>8:</b> chantools forceclose"
14+
--> "<b>9:</b> Wait for timelocks"
15+
--> "<b>10:</b> chantools sweeptimelock"
16+
if "Pending/Open\nchannels left?" then
17+
-->[yes] ===MANUAL===
18+
else
19+
-->[no] ===DONE===
20+
endif
21+
else
22+
-->[no] ===DONE===
23+
endif
24+
else
25+
-->[no] ===MANUAL===
26+
--> "<b>11:</b> Manual intervention necessary"
27+
--> (*)
28+
endif
29+
else
30+
-->[no] ===DONE===
31+
note right
32+
Recovery complete
33+
end note
34+
endif
35+
36+
--> (*)
37+
38+
@enduml

doc/rescue-flow.png

68.1 KB
Loading

0 commit comments

Comments
 (0)