|  | 
|  | 1 | +# Increasing LND reliability by clustering | 
|  | 2 | + | 
|  | 3 | +Normally LND nodes use the embedded bbolt database to store all important states. | 
|  | 4 | +This method of running has been proven to work well in a variety of environments, | 
|  | 5 | +from mobile clients to large nodes serving hundreds of channels. With scale however | 
|  | 6 | +it is desirable to be able to replicate LND's state to quickly and reliably move nodes, | 
|  | 7 | +do updates and be more resilient to datacenter failures. | 
|  | 8 | + | 
|  | 9 | +It is now possible to store all essential state in a replicated etcd DB and to | 
|  | 10 | +run multiple LND nodes on different machines where only one of them (the leader)  | 
|  | 11 | +is able to read and mutate the database. In such setup if the leader node fails | 
|  | 12 | +or decommissioned, a follower node will be elected as the new leader and will | 
|  | 13 | +quickly come online to minimize downtime. | 
|  | 14 | + | 
|  | 15 | +The leader election feature currently relies on etcd to work both for the election | 
|  | 16 | +itself and for the replicated data store. | 
|  | 17 | + | 
|  | 18 | +## Building LND with leader election support | 
|  | 19 | + | 
|  | 20 | +To create a dev build of LND with leader election support use the following command: | 
|  | 21 | + | 
|  | 22 | +```shell | 
|  | 23 | +$  make tags="kvdb_etcd" | 
|  | 24 | +``` | 
|  | 25 | + | 
|  | 26 | +## Running a local etcd instance for testing | 
|  | 27 | + | 
|  | 28 | +To start your local etcd instance for testing run: | 
|  | 29 | + | 
|  | 30 | +```shell | 
|  | 31 | +$  ./etcd \ | 
|  | 32 | +    --auto-tls \ | 
|  | 33 | +    --advertise-client-urls=https://127.0.0.1:2379 \ | 
|  | 34 | +    --listen-client-urls=https://0.0.0.0:2379 \ | 
|  | 35 | +    --max-txn-ops=16384 \ | 
|  | 36 | +    --max-request-bytes=104857600 | 
|  | 37 | +``` | 
|  | 38 | + | 
|  | 39 | +The large `max-txn-ops` and `max-request-bytes` values are currently recommended | 
|  | 40 | +but may not be required in the future. | 
|  | 41 | + | 
|  | 42 | +## Configuring LND to run on etcd and participate in leader election | 
|  | 43 | + | 
|  | 44 | +To run LND with etcd, additional configuration is needed, specified either | 
|  | 45 | +through command line flags or in `lnd.conf`. | 
|  | 46 | + | 
|  | 47 | +Sample command line: | 
|  | 48 | + | 
|  | 49 | +```shell | 
|  | 50 | +$  ./lnd-debug \ | 
|  | 51 | +    --db.backend=etcd \ | 
|  | 52 | +    --db.etcd.host=127.0.0.1:2379 \ | 
|  | 53 | +    --db.etcd.certfile=/home/user/etcd/bin/default.etcd/fixtures/client/cert.pem \ | 
|  | 54 | +    --db.etcd.keyfile=/home/user/etcd/bin/default.etcd/fixtures/client/key.pem \ | 
|  | 55 | +    --db.etcd.insecure_skip_verify \ | 
|  | 56 | +    --cluster.enable-leader-election \ | 
|  | 57 | +    --cluster.leader-elector=etcd \ | 
|  | 58 | +    --cluster.etcd-election-prefix=cluster-leader \ | 
|  | 59 | +    --cluster.id=lnd-1 | 
|  | 60 | +``` | 
|  | 61 | +The `cluster.etcd-election-prefix` option sets the election's etcd key prefix.  | 
|  | 62 | +The `cluster.id` is used to identify the individual nodes in the cluster | 
|  | 63 | +and should be set to a different value for each node. | 
|  | 64 | + | 
|  | 65 | +Optionally users can specify `db.etcd.user` and `db.etcd.pass` for db user | 
|  | 66 | +authentication. If the database is shared, it is possible to separate our data | 
|  | 67 | +from other users by setting `db.etcd.namespace` to an (already existing) etcd | 
|  | 68 | +namespace. In order to test without TLS, we can set `db.etcd.disabletls` | 
|  | 69 | +flag to `true`. | 
|  | 70 | + | 
|  | 71 | +Once the node is up and running we can start more nodes with the same command line. | 
|  | 72 | + | 
|  | 73 | +## Identifying the leader node | 
|  | 74 | + | 
|  | 75 | +The above setup is useful for testing but is not viable when running in a production | 
|  | 76 | +environment. For users relying on containers and orchestration services, it is | 
|  | 77 | +essential to know which node is the leader to be able to automatically route | 
|  | 78 | +network traffic to the right instance. For example in Kubernetes, the load balancer | 
|  | 79 | +will route traffic to all "ready" nodes. This readiness may be monitored by a | 
|  | 80 | +readiness probe. | 
|  | 81 | + | 
|  | 82 | +For readiness probing we can simply use LND's state RPC service where a special state | 
|  | 83 | +`WAITING_TO_START` indicates that the node is waiting to become the leader and is | 
|  | 84 | +not started yet. To test this we can simply curl the REST endpoint of the state RPC: | 
|  | 85 | + | 
|  | 86 | +``` | 
|  | 87 | +readinessProbe: | 
|  | 88 | +    exec: | 
|  | 89 | +      command: [ | 
|  | 90 | +        "/bin/sh", | 
|  | 91 | +        "-c", | 
|  | 92 | +        "set -e; set -o pipefail; curl -s -k -o - https://localhost:8080/v1/state | jq .'State' | grep -E 'NON_EXISTING|LOCKED|UNLOCKED|RPC_ACTIVE|SERVER_ACTIVE'", | 
|  | 93 | +      ] | 
|  | 94 | +    periodSeconds: 1 | 
|  | 95 | +``` | 
|  | 96 | + | 
|  | 97 | +## What data is written to the replicated remote database?  | 
|  | 98 | + | 
|  | 99 | +Beginning with LND 0.14.0 when using a remote database (etcd or PostgreSQL) all | 
|  | 100 | +LND data will be written to the replicated database, including the wallet data | 
|  | 101 | +which contains the key material and node identity, the graph, the channel state, | 
|  | 102 | +the macaroon and the watchtower client databases. This means that when using | 
|  | 103 | +leader election there's no need to copy anything between instances of the LND | 
|  | 104 | +cluster. | 
|  | 105 | + | 
|  | 106 | +## Is leader election supported for Postgres? | 
|  | 107 | + | 
|  | 108 | +No, leader election is not supported by Postgres itself since it doesn't have a | 
|  | 109 | +mechanism to reliably **determine a leading node**. It is, however, possible to | 
|  | 110 | +use Postgres **as the LND database backend** while using an etcd cluster purely | 
|  | 111 | +for the leader election functionality.  | 
0 commit comments