Skip to content

Commit e7ecb9e

Browse files
authored
chore: update README.md (#131)
1 parent a468395 commit e7ecb9e

File tree

1 file changed

+138
-81
lines changed

1 file changed

+138
-81
lines changed

README.md

Lines changed: 138 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -168,47 +168,151 @@ Internally the `Scaler` API uses the `internal/scaler` gRPC client to manage the
168168

169169
Here is a `SCALE UP` example of how to scale the cluster via the HTTP API.
170170

171-
The below example will depict the scenario of scaling from 1 node to 3 nodes.
172-
173-
1. Use the `/hashRangeMovements` to preview the number of hash ranges that will be moved when scaling
174-
* see [Previewing a Scaling Operation](#previewing-a-scaling-operation) for more details
175-
2. Merge a `rudder-devops` PR what will:
176-
* Increase the CPU and memory of the nodes to give more power for the creation of the snapshots
177-
* Consider increasing IOPS and disk throughput as well if necessary
178-
* Increase the number of replicas (e.g. from 1 to 3) i.e. `replicaCount: 3`
179-
* Set the `degradedNodes`
180-
* i.e. `false,true,true` which means that the 1st node will continue to receive traffic as usual whilst the 2
181-
new nodes will not receive traffic but will be available to receive new snapshots
182-
3. If necessary call `/hashRangeMovements` again with `upload=true,full_sync=true` to start uploading the snapshots to
183-
the cloud storage
184-
* Pre-uploads and pre-download can be useful for several reasons:
185-
* To measure how long snapshots creation and loading might take without actually having to scale the cluster
186-
* To do a full sync before scaling (full syncs delete old data that might be expired from S3 so that then nodes
187-
won't have to download expired data, making the scaling process faster later)
188-
* You can still create snapshots during `/autoScale` but then they won't have to be full sync snapshots, meaning
189-
they can just be small files that contain only the most recent data (see `since` in
190-
[node/node.go](./node/node.go))
191-
4. Call `/autoScale`
192-
* You can do this if you haven't already moved the data via `/hashRangeMovements`
193-
* You can call it with `skip_create_snapshots=true` if you already created the snapshots in the previous operation
194-
5. Merge another `rudder-devops` PR to set `degradedNodes` to either empty or `false,false,false`
195-
* If necessary you should reduce the CPU and memory if you ended up increasing them in point 2
171+
The below example depicts the scenario of scaling from 1 node to 3 nodes.
172+
173+
**Step 1: Enable Scaler and Add New Nodes in Degraded Mode**
174+
175+
Merge a `rudder-devops` PR that will:
176+
* Enable the scaler component if not already enabled
177+
* Increase the CPU and memory of the nodes to give more power for the creation of the snapshots
178+
* Consider increasing IOPS and disk throughput as well if necessary
179+
* Increase the number of replicas (e.g. from 1 to 3) i.e. `replicaCount: 3`
180+
* Set the `degradedNodes` configuration
181+
* e.g. `false,true,true` which means that the 1st node will continue to receive traffic as usual whilst the 2
182+
new nodes will not receive traffic but will be available to receive new snapshots
183+
184+
**Step 2: Move Hash Ranges**
185+
186+
Call `/hashRangeMovements` to move the hash ranges. You can use one of two methods:
187+
188+
**Step 2.1: Using Node-to-Node Streaming (Recommended)**
189+
190+
This method transfers data directly between nodes without using cloud storage as an intermediary, which is faster and more efficient.
191+
192+
```bash
193+
curl --location 'localhost:8080/hashRangeMovements' \
194+
--header 'Content-Type: application/json' \
195+
--data '{
196+
"old_cluster_size": 1,
197+
"new_cluster_size": 3,
198+
"total_hash_ranges": 271,
199+
"streaming": true
200+
}'
201+
```
202+
203+
**Step 2.2: Using Cloud Storage (Upload and Download)**
204+
205+
This method uses cloud storage as an intermediary. It's useful when nodes cannot communicate directly or when you want to create backups during the scaling process.
206+
207+
First, preview the operation (optional but recommended):
208+
```bash
209+
curl --location 'localhost:8080/hashRangeMovements' \
210+
--header 'Content-Type: application/json' \
211+
--data '{
212+
"old_cluster_size": 1,
213+
"new_cluster_size": 3,
214+
"total_hash_ranges": 271
215+
}'
216+
```
217+
218+
Then, upload the snapshots to cloud storage:
219+
```bash
220+
curl --location 'localhost:8080/hashRangeMovements' \
221+
--header 'Content-Type: application/json' \
222+
--data '{
223+
"old_cluster_size": 1,
224+
"new_cluster_size": 3,
225+
"total_hash_ranges": 271,
226+
"upload": true,
227+
"full_sync": true
228+
}'
229+
```
230+
231+
Finally, download and load the snapshots:
232+
```bash
233+
curl --location 'localhost:8080/hashRangeMovements' \
234+
--header 'Content-Type: application/json' \
235+
--data '{
236+
"old_cluster_size": 1,
237+
"new_cluster_size": 3,
238+
"total_hash_ranges": 271,
239+
"download": true
240+
}'
241+
```
242+
243+
**Notes:**
244+
* Pre-uploads and pre-downloads can be useful for several reasons:
245+
* To measure how long snapshots creation and loading might take without actually having to scale the cluster
246+
* To do a full sync before scaling (full syncs delete old data that might be expired from S3 so that nodes
247+
won't have to download expired data, making the scaling process faster later)
248+
* Full sync snapshots contain all the data, while incremental snapshots only contain data since the last snapshot
249+
(see `since` in [node/node.go](./node/node.go))
250+
251+
**Step 3: Remove Degraded Mode**
252+
253+
Merge another `rudder-devops` PR to set `degradedNodes` to either empty or `false,false,false`
254+
* If necessary you should reduce the CPU and memory if you ended up increasing them in Step 1
255+
256+
---
196257

197258
Here is a `SCALE DOWN` example of how to scale the cluster via the HTTP API.
198259

199-
The below example will depict the scenario of scaling from 3 to 1 node.
260+
The below example depicts the scenario of scaling from 3 nodes to 1 node.
261+
262+
**Step 1: Enable Scaler**
263+
264+
Merge a `rudder-devops` PR to enable the scaler component if not already enabled.
265+
266+
**Step 2: Move Hash Ranges**
267+
268+
Call `/hashRangeMovements` to move the data from the nodes being removed to the remaining node(s).
269+
270+
**Using Streaming Mode (Recommended):**
271+
272+
```bash
273+
curl --location 'localhost:8080/hashRangeMovements' \
274+
--header 'Content-Type: application/json' \
275+
--data '{
276+
"old_cluster_size": 3,
277+
"new_cluster_size": 1,
278+
"total_hash_ranges": 271,
279+
"streaming": true
280+
}'
281+
```
282+
283+
**Alternatively, using Cloud Storage:**
200284

201-
1. Use the `/hashRangeMovements` to preview the number of hash ranges that will be moved when scaling
202-
* see [Previewing a Scaling Operation](#previewing-a-scaling-operation) for more details
203-
2. Call `/hashRangeMovements` again with `upload=true,full_sync=true` to move the data
204-
3. Merge a `rudder-devops` PR to decrease the `replicaCount`
285+
```bash
286+
curl --location 'localhost:8080/hashRangeMovements' \
287+
--header 'Content-Type: application/json' \
288+
--data '{
289+
"old_cluster_size": 3,
290+
"new_cluster_size": 1,
291+
"total_hash_ranges": 271,
292+
"upload": true,
293+
"download": true,
294+
"full_sync": true
295+
}'
296+
```
297+
298+
**Step 3: Mark Nodes as Degraded**
299+
300+
Merge a `rudder-devops` PR to mark the nodes that will be removed as degraded by setting the `degradedNodes` configuration:
301+
* e.g. `false,true,true` which means that the 1st node will continue to receive traffic whilst nodes 2 and 3
302+
are marked as degraded and will not receive traffic
303+
304+
**Why this is important:** Marking nodes as degraded ensures the client is properly informed about the cluster size change. When a client hits a degraded node (e.g., node 1 or node 2), those nodes will tell the client to use the new cluster size and talk to node 0 instead. If you skip this step and just remove the nodes, the client might try to hit node 1, but since node 1 won't be there anymore, it can't redirect the client to node 0, potentially causing the client to get stuck.
305+
306+
**Step 4: Reduce Replica Count**
307+
308+
Merge a separate `rudder-devops` PR to decrease the `replicaCount` and remove the `degradedNodes` configuration
309+
* This will remove the degraded nodes from the cluster
205310

206311
### Previewing a Scaling Operation
207312

208313
Why previewing a scaling operation?
209314
* It helps you understand the impact of the scaling operation on the cluster
210-
* It will help decrease the length of the `/autoScale` operation in case you called `/hashRangeMovements`
211-
with `upload=true`
315+
* It will help you plan the scaling operation by showing you how many hash ranges need to be moved
212316
* It will help you consider the impact of snapshots creation before actually scaling the cluster
213317

214318
To preview a scaling operation you can call `/hashRangeMovements` like in the example below:
@@ -386,48 +490,6 @@ curl --location 'localhost:8080/backup' \
386490

387491
Where `total` is the number of hash ranges that were processed (either uploaded or downloaded).
388492

389-
### AutoScale
390-
391-
The `AutoScale` scaler procedure can be used by sending an HTTP POST request to `/autoScale` with a JSON payload that
392-
follows this schema:
393-
394-
```json
395-
{
396-
"old_nodes_addresses": [
397-
"keydb-0.keydb-headless.loveholidays.svc.cluster.local:50051",
398-
"keydb-1.keydb-headless.loveholidays.svc.cluster.local:50051"
399-
],
400-
"new_nodes_addresses": [
401-
"keydb-0.keydb-headless.loveholidays.svc.cluster.local:50051",
402-
"keydb-1.keydb-headless.loveholidays.svc.cluster.local:50051",
403-
"keydb-2.keydb-headless.loveholidays.svc.cluster.local:50051",
404-
"keydb-3.keydb-headless.loveholidays.svc.cluster.local:50051"
405-
],
406-
"full_sync": false,
407-
"skip_create_snapshots": false,
408-
"create_snapshots_max_concurrency": 10,
409-
"load_snapshots_max_concurrency": 3,
410-
"disable_create_snapshots_sequentially": false,
411-
"streaming": false
412-
}
413-
```
414-
415-
Usage:
416-
* if `old_nodes_addresses` length < `new_nodes_addresses` length → triggers **SCALE UP**
417-
* if `old_nodes_addresses` length > `new_nodes_addresses` length → triggers **SCALE DOWN**
418-
* if `old_nodes_addresses` length == `new_nodes_addresses` length → triggers **AUTO-HEALING**
419-
* it updates the scaler internal addresses of all the nodes and tells the nodes what is the desired cluster size
420-
without creating nor loading any snapshot
421-
* if `full_sync` == `true` → triggers a full sync, deleting all the old files on S3 for the selected hash ranges
422-
* useful to avoid having nodes download data from S3 that might be too old (thus containing expired data)
423-
* if `skip_create_snapshots` == `true` → it does not ask nodes to create snapshots
424-
* useful if you did a pre-upload via `/hashRangeMovements`
425-
* if `create_snapshots_max_concurrency` > 0 → it limits how many snapshots can be created concurrently (default: 10)
426-
* if `load_snapshots_max_concurrency` > 0 → it limits how many snapshots can be loaded concurrently from S3 (default: 10)
427-
* if `disable_create_snapshots_sequentially` == `true` → snapshots are created in parallel instead of sequentially
428-
* By default, snapshots are created sequentially to reduce resource pressure on nodes
429-
* if the operation does not succeed after all the retries, then a rollback to the "last operation" is triggered
430-
* to see what was the "last recorded operation" you can call `/lastOperation`
431493

432494
### Degraded Mode
433495

@@ -449,8 +511,7 @@ scaling operations. This helps prevent data inconsistencies when nodes restart d
449511
# New nodes (node 2, 3) - start in degraded mode
450512
export KEYDB_DEGRADED_NODES="false,false,true,true"
451513
```
452-
2. **Run `/autoScale`**: The scaler moves the data between the old and new nodes and makes sure all nodes have all the
453-
nodes' addresses updated.
514+
2. **Move data**: Use `/hashRangeMovements` to move the data between the old and new nodes.
454515
3. **Second devops PR**: Update persistent config to remove degraded mode permanently
455516

456517
This approach ensures that if any old node restarts during scaling, it won't use incorrect cluster size configurations.
@@ -464,11 +525,9 @@ This approach ensures that if any old node restarts during scaling, it won't use
464525
### Alternative Scaling Methods
465526
* You can simply merge a devops PR with the desired cluster size and restart the nodes
466527
* In this case data won't be moved between nodes so it will lead to data loss
467-
* If you don't want to restart the nodes you can use `/autoScale` in auto-healing mode (i.e. with `old_cluster_size`
468-
equal to `new_cluster_size`)
469528
* You can do everything manually by calling the single endpoints yourself, although it might be burdensome with a lot
470529
of hash ranges to move, so this would mean calling `/createSnapshots`, `/loadSnapshots`, and `/updateClusterData`
471-
manually (which is what the `/autoScale` endpoint does under the hood)
530+
manually
472531

473532
### Postman collection
474533

@@ -483,10 +542,8 @@ This approach ensures that if any old node restarts during scaling, it won't use
483542
- `POST /loadSnapshots` - Downloads snapshots from cloud storage and loads them into BadgerDB
484543
- `POST /updateClusterData` - Update the scaler internal information with the addresses of all the nodes that comprise
485544
the cluster
486-
- `POST /autoScale` - Automatic scaling with retry and rollback logic
487545
- `POST /hashRangeMovements` - Get hash range movement information (supports snapshot creation and loading)
488546
- `POST /backup` - Create and/or load snapshots for all (or specific) nodes without scaling
489-
- `GET /lastOperation` - Get last scaling operation status
490547

491548
## Performance
492549

0 commit comments

Comments
 (0)