@@ -168,47 +168,151 @@ Internally the `Scaler` API uses the `internal/scaler` gRPC client to manage the
168168
169169Here is a ` SCALE UP ` example of how to scale the cluster via the HTTP API.
170170
171- The below example will depict the scenario of scaling from 1 node to 3 nodes.
172-
173- 1 . Use the ` /hashRangeMovements ` to preview the number of hash ranges that will be moved when scaling
174- * see [ Previewing a Scaling Operation] ( #previewing-a-scaling-operation ) for more details
175- 2 . Merge a ` rudder-devops ` PR what will:
176- * Increase the CPU and memory of the nodes to give more power for the creation of the snapshots
177- * Consider increasing IOPS and disk throughput as well if necessary
178- * Increase the number of replicas (e.g. from 1 to 3) i.e. ` replicaCount: 3 `
179- * Set the ` degradedNodes `
180- * i.e. ` false,true,true ` which means that the 1st node will continue to receive traffic as usual whilst the 2
181- new nodes will not receive traffic but will be available to receive new snapshots
182- 3 . If necessary call ` /hashRangeMovements ` again with ` upload=true,full_sync=true ` to start uploading the snapshots to
183- the cloud storage
184- * Pre-uploads and pre-download can be useful for several reasons:
185- * To measure how long snapshots creation and loading might take without actually having to scale the cluster
186- * To do a full sync before scaling (full syncs delete old data that might be expired from S3 so that then nodes
187- won't have to download expired data, making the scaling process faster later)
188- * You can still create snapshots during ` /autoScale ` but then they won't have to be full sync snapshots, meaning
189- they can just be small files that contain only the most recent data (see ` since ` in
190- [ node/node.go] ( ./node/node.go ) )
191- 4 . Call ` /autoScale `
192- * You can do this if you haven't already moved the data via ` /hashRangeMovements `
193- * You can call it with ` skip_create_snapshots=true ` if you already created the snapshots in the previous operation
194- 5 . Merge another ` rudder-devops ` PR to set ` degradedNodes ` to either empty or ` false,false,false `
195- * If necessary you should reduce the CPU and memory if you ended up increasing them in point 2
171+ The below example depicts the scenario of scaling from 1 node to 3 nodes.
172+
173+ ** Step 1: Enable Scaler and Add New Nodes in Degraded Mode**
174+
175+ Merge a ` rudder-devops ` PR that will:
176+ * Enable the scaler component if not already enabled
177+ * Increase the CPU and memory of the nodes to give more power for the creation of the snapshots
178+ * Consider increasing IOPS and disk throughput as well if necessary
179+ * Increase the number of replicas (e.g. from 1 to 3) i.e. ` replicaCount: 3 `
180+ * Set the ` degradedNodes ` configuration
181+ * e.g. ` false,true,true ` which means that the 1st node will continue to receive traffic as usual whilst the 2
182+ new nodes will not receive traffic but will be available to receive new snapshots
183+
184+ ** Step 2: Move Hash Ranges**
185+
186+ Call ` /hashRangeMovements ` to move the hash ranges. You can use one of two methods:
187+
188+ ** Step 2.1: Using Node-to-Node Streaming (Recommended)**
189+
190+ This method transfers data directly between nodes without using cloud storage as an intermediary, which is faster and more efficient.
191+
192+ ``` bash
193+ curl --location ' localhost:8080/hashRangeMovements' \
194+ --header ' Content-Type: application/json' \
195+ --data ' {
196+ "old_cluster_size": 1,
197+ "new_cluster_size": 3,
198+ "total_hash_ranges": 271,
199+ "streaming": true
200+ }'
201+ ```
202+
203+ ** Step 2.2: Using Cloud Storage (Upload and Download)**
204+
205+ This method uses cloud storage as an intermediary. It's useful when nodes cannot communicate directly or when you want to create backups during the scaling process.
206+
207+ First, preview the operation (optional but recommended):
208+ ``` bash
209+ curl --location ' localhost:8080/hashRangeMovements' \
210+ --header ' Content-Type: application/json' \
211+ --data ' {
212+ "old_cluster_size": 1,
213+ "new_cluster_size": 3,
214+ "total_hash_ranges": 271
215+ }'
216+ ```
217+
218+ Then, upload the snapshots to cloud storage:
219+ ``` bash
220+ curl --location ' localhost:8080/hashRangeMovements' \
221+ --header ' Content-Type: application/json' \
222+ --data ' {
223+ "old_cluster_size": 1,
224+ "new_cluster_size": 3,
225+ "total_hash_ranges": 271,
226+ "upload": true,
227+ "full_sync": true
228+ }'
229+ ```
230+
231+ Finally, download and load the snapshots:
232+ ``` bash
233+ curl --location ' localhost:8080/hashRangeMovements' \
234+ --header ' Content-Type: application/json' \
235+ --data ' {
236+ "old_cluster_size": 1,
237+ "new_cluster_size": 3,
238+ "total_hash_ranges": 271,
239+ "download": true
240+ }'
241+ ```
242+
243+ ** Notes:**
244+ * Pre-uploads and pre-downloads can be useful for several reasons:
245+ * To measure how long snapshots creation and loading might take without actually having to scale the cluster
246+ * To do a full sync before scaling (full syncs delete old data that might be expired from S3 so that nodes
247+ won't have to download expired data, making the scaling process faster later)
248+ * Full sync snapshots contain all the data, while incremental snapshots only contain data since the last snapshot
249+ (see ` since ` in [ node/node.go] ( ./node/node.go ) )
250+
251+ ** Step 3: Remove Degraded Mode**
252+
253+ Merge another ` rudder-devops ` PR to set ` degradedNodes ` to either empty or ` false,false,false `
254+ * If necessary you should reduce the CPU and memory if you ended up increasing them in Step 1
255+
256+ ---
196257
197258Here is a ` SCALE DOWN ` example of how to scale the cluster via the HTTP API.
198259
199- The below example will depict the scenario of scaling from 3 to 1 node.
260+ The below example depicts the scenario of scaling from 3 nodes to 1 node.
261+
262+ ** Step 1: Enable Scaler**
263+
264+ Merge a ` rudder-devops ` PR to enable the scaler component if not already enabled.
265+
266+ ** Step 2: Move Hash Ranges**
267+
268+ Call ` /hashRangeMovements ` to move the data from the nodes being removed to the remaining node(s).
269+
270+ ** Using Streaming Mode (Recommended):**
271+
272+ ``` bash
273+ curl --location ' localhost:8080/hashRangeMovements' \
274+ --header ' Content-Type: application/json' \
275+ --data ' {
276+ "old_cluster_size": 3,
277+ "new_cluster_size": 1,
278+ "total_hash_ranges": 271,
279+ "streaming": true
280+ }'
281+ ```
282+
283+ ** Alternatively, using Cloud Storage:**
200284
201- 1 . Use the ` /hashRangeMovements ` to preview the number of hash ranges that will be moved when scaling
202- * see [ Previewing a Scaling Operation] ( #previewing-a-scaling-operation ) for more details
203- 2 . Call ` /hashRangeMovements ` again with ` upload=true,full_sync=true ` to move the data
204- 3 . Merge a ` rudder-devops ` PR to decrease the ` replicaCount `
285+ ``` bash
286+ curl --location ' localhost:8080/hashRangeMovements' \
287+ --header ' Content-Type: application/json' \
288+ --data ' {
289+ "old_cluster_size": 3,
290+ "new_cluster_size": 1,
291+ "total_hash_ranges": 271,
292+ "upload": true,
293+ "download": true,
294+ "full_sync": true
295+ }'
296+ ```
297+
298+ ** Step 3: Mark Nodes as Degraded**
299+
300+ Merge a ` rudder-devops ` PR to mark the nodes that will be removed as degraded by setting the ` degradedNodes ` configuration:
301+ * e.g. ` false,true,true ` which means that the 1st node will continue to receive traffic whilst nodes 2 and 3
302+ are marked as degraded and will not receive traffic
303+
304+ ** Why this is important:** Marking nodes as degraded ensures the client is properly informed about the cluster size change. When a client hits a degraded node (e.g., node 1 or node 2), those nodes will tell the client to use the new cluster size and talk to node 0 instead. If you skip this step and just remove the nodes, the client might try to hit node 1, but since node 1 won't be there anymore, it can't redirect the client to node 0, potentially causing the client to get stuck.
305+
306+ ** Step 4: Reduce Replica Count**
307+
308+ Merge a separate ` rudder-devops ` PR to decrease the ` replicaCount ` and remove the ` degradedNodes ` configuration
309+ * This will remove the degraded nodes from the cluster
205310
206311### Previewing a Scaling Operation
207312
208313Why previewing a scaling operation?
209314* It helps you understand the impact of the scaling operation on the cluster
210- * It will help decrease the length of the ` /autoScale ` operation in case you called ` /hashRangeMovements `
211- with ` upload=true `
315+ * It will help you plan the scaling operation by showing you how many hash ranges need to be moved
212316* It will help you consider the impact of snapshots creation before actually scaling the cluster
213317
214318To preview a scaling operation you can call ` /hashRangeMovements ` like in the example below:
@@ -386,48 +490,6 @@ curl --location 'localhost:8080/backup' \
386490
387491Where ` total ` is the number of hash ranges that were processed (either uploaded or downloaded).
388492
389- ### AutoScale
390-
391- The ` AutoScale ` scaler procedure can be used by sending an HTTP POST request to ` /autoScale ` with a JSON payload that
392- follows this schema:
393-
394- ``` json
395- {
396- "old_nodes_addresses" : [
397- " keydb-0.keydb-headless.loveholidays.svc.cluster.local:50051" ,
398- " keydb-1.keydb-headless.loveholidays.svc.cluster.local:50051"
399- ],
400- "new_nodes_addresses" : [
401- " keydb-0.keydb-headless.loveholidays.svc.cluster.local:50051" ,
402- " keydb-1.keydb-headless.loveholidays.svc.cluster.local:50051" ,
403- " keydb-2.keydb-headless.loveholidays.svc.cluster.local:50051" ,
404- " keydb-3.keydb-headless.loveholidays.svc.cluster.local:50051"
405- ],
406- "full_sync" : false ,
407- "skip_create_snapshots" : false ,
408- "create_snapshots_max_concurrency" : 10 ,
409- "load_snapshots_max_concurrency" : 3 ,
410- "disable_create_snapshots_sequentially" : false ,
411- "streaming" : false
412- }
413- ```
414-
415- Usage:
416- * if ` old_nodes_addresses ` length < ` new_nodes_addresses ` length → triggers ** SCALE UP**
417- * if ` old_nodes_addresses ` length > ` new_nodes_addresses ` length → triggers ** SCALE DOWN**
418- * if ` old_nodes_addresses ` length == ` new_nodes_addresses ` length → triggers ** AUTO-HEALING**
419- * it updates the scaler internal addresses of all the nodes and tells the nodes what is the desired cluster size
420- without creating nor loading any snapshot
421- * if ` full_sync ` == ` true ` → triggers a full sync, deleting all the old files on S3 for the selected hash ranges
422- * useful to avoid having nodes download data from S3 that might be too old (thus containing expired data)
423- * if ` skip_create_snapshots ` == ` true ` → it does not ask nodes to create snapshots
424- * useful if you did a pre-upload via ` /hashRangeMovements `
425- * if ` create_snapshots_max_concurrency ` > 0 → it limits how many snapshots can be created concurrently (default: 10)
426- * if ` load_snapshots_max_concurrency ` > 0 → it limits how many snapshots can be loaded concurrently from S3 (default: 10)
427- * if ` disable_create_snapshots_sequentially ` == ` true ` → snapshots are created in parallel instead of sequentially
428- * By default, snapshots are created sequentially to reduce resource pressure on nodes
429- * if the operation does not succeed after all the retries, then a rollback to the "last operation" is triggered
430- * to see what was the "last recorded operation" you can call ` /lastOperation `
431493
432494### Degraded Mode
433495
@@ -449,8 +511,7 @@ scaling operations. This helps prevent data inconsistencies when nodes restart d
449511 # New nodes (node 2, 3) - start in degraded mode
450512 export KEYDB_DEGRADED_NODES=" false,false,true,true"
451513 ```
452- 2 . ** Run ` /autoScale ` ** : The scaler moves the data between the old and new nodes and makes sure all nodes have all the
453- nodes' addresses updated.
514+ 2 . ** Move data** : Use ` /hashRangeMovements ` to move the data between the old and new nodes.
4545153 . ** Second devops PR** : Update persistent config to remove degraded mode permanently
455516
456517This approach ensures that if any old node restarts during scaling, it won't use incorrect cluster size configurations.
@@ -464,11 +525,9 @@ This approach ensures that if any old node restarts during scaling, it won't use
464525### Alternative Scaling Methods
465526* You can simply merge a devops PR with the desired cluster size and restart the nodes
466527 * In this case data won't be moved between nodes so it will lead to data loss
467- * If you don't want to restart the nodes you can use ` /autoScale ` in auto-healing mode (i.e. with ` old_cluster_size `
468- equal to ` new_cluster_size ` )
469528* You can do everything manually by calling the single endpoints yourself, although it might be burdensome with a lot
470529 of hash ranges to move, so this would mean calling ` /createSnapshots ` , ` /loadSnapshots ` , and ` /updateClusterData `
471- manually (which is what the ` /autoScale ` endpoint does under the hood)
530+ manually
472531
473532### Postman collection
474533
@@ -483,10 +542,8 @@ This approach ensures that if any old node restarts during scaling, it won't use
483542- ` POST /loadSnapshots ` - Downloads snapshots from cloud storage and loads them into BadgerDB
484543- ` POST /updateClusterData ` - Update the scaler internal information with the addresses of all the nodes that comprise
485544 the cluster
486- - ` POST /autoScale ` - Automatic scaling with retry and rollback logic
487545- ` POST /hashRangeMovements ` - Get hash range movement information (supports snapshot creation and loading)
488546- ` POST /backup ` - Create and/or load snapshots for all (or specific) nodes without scaling
489- - ` GET /lastOperation ` - Get last scaling operation status
490547
491548## Performance
492549
0 commit comments