You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: fluss-flink/fluss-flink-common/src/main/java/org/apache/fluss/flink/procedure/ListRebalanceProcessProcedure.java
Fluss provides procedures to rebalance buckets across the cluster based on workload.
285
+
Rebalancing primarily occurs in the following scenarios: Offline existing tabletServers
286
+
from the cluster, adding new tabletServers to the cluster, and routine adjustments for load imbalance.
287
+
288
+
### add_server_tag
289
+
290
+
Add server tag to TabletServers in the cluster. For example, adding `tabletServer-0` with `PERMANENT_OFFLINE` tag
291
+
indicates that `tabletServer-0` is about to be permanently decommissioned, and during the next rebalance,
292
+
all buckets on this node need to be migrated away.
293
+
294
+
**Syntax:**
295
+
296
+
```sql
297
+
CALL [catalog_name.]sys.add_server_tag(
298
+
tabletServers =>'STRING',
299
+
serverTag =>'STRING'
300
+
)
301
+
```
302
+
303
+
**Parameters:**
304
+
305
+
-`tabletServers` (required): The TabletServer IDs to add tag to. Can be a single server ID (e.g., `'0'`) or multiple IDs separated by commas (e.g., `'0,1,2'`).
306
+
-`serverTag` (required): The tag to add to the TabletServers. Valid values are:
307
+
-`'PERMANENT_OFFLINE'`: Indicates the TabletServer is permanently offline and will be decommissioned. All buckets on this server will be migrated during the next rebalance.
308
+
-`'TEMPORARY_OFFLINE'`: Indicates the TabletServer is temporarily offline (e.g., for upgrading). Buckets may be temporarily migrated but can return after the server comes back online.
309
+
310
+
**Returns:** An array with a single element `'success'` if the operation completes successfully.
311
+
312
+
**Example:**
313
+
314
+
```sql title="Flink SQL"
315
+
-- Use the Fluss catalog (replace 'fluss_catalog' with your catalog name if different)
316
+
USE fluss_catalog;
317
+
318
+
-- Add PERMANENT_OFFLINE tag to a single TabletServer
Remove server tag from TabletServers in the cluster. This operation is typically used when a previously tagged TabletServer is ready to return to normal service, or to cancel a planned offline operation.
328
+
329
+
**Syntax:**
330
+
331
+
```sql
332
+
CALL [catalog_name.]sys.remove_server_tag(
333
+
tabletServers =>'STRING',
334
+
serverTag =>'STRING'
335
+
)
336
+
```
337
+
338
+
**Parameters:**
339
+
340
+
-`tabletServers` (required): The TabletServer IDs to remove tag from. Can be a single server ID (e.g., `'0'`) or multiple IDs separated by commas (e.g., `'0,1,2'`).
341
+
-`serverTag` (required): The tag to remove from the TabletServers. Valid values are:
342
+
-`'PERMANENT_OFFLINE'`: Remove the permanent offline tag from the TabletServer.
343
+
-`'TEMPORARY_OFFLINE'`: Remove the temporary offline tag from the TabletServer.
344
+
345
+
**Returns:** An array with a single element `'success'` if the operation completes successfully.
346
+
347
+
**Example:**
348
+
349
+
```sql title="Flink SQL"
350
+
-- Use the Fluss catalog (replace 'fluss_catalog' with your catalog name if different)
351
+
USE fluss_catalog;
352
+
353
+
-- Remove PERMANENT_OFFLINE tag from a single TabletServer
Trigger a rebalance operation to redistribute buckets across TabletServers in the cluster. This procedure helps balance workload based on specified goals, such as distributing replicas or leaders evenly across the cluster.
363
+
364
+
**Syntax:**
365
+
366
+
```sql
367
+
CALL [catalog_name.]sys.rebalance(
368
+
priorityGoals =>'STRING'
369
+
)
370
+
```
371
+
372
+
**Parameters:**
373
+
374
+
-`priorityGoals` (required): The rebalance goals to achieve, specified as goal types. Can be a single goal (e.g., `'REPLICA_DISTRIBUTION'`) or multiple goals separated by commas (e.g., `'REPLICA_DISTRIBUTION,LEADER_DISTRIBUTION'`). Valid goal types are:
375
+
-`'REPLICA_DISTRIBUTION'`: Generates replica movement tasks to ensure the number of replicas on each TabletServer is near balanced.
376
+
-`'LEADER_DISTRIBUTION'`: Generates leadership movement and leader replica movement tasks to ensure the number of leader replicas on each TabletServer is near balanced.
377
+
378
+
**Returns:** An array with a single element containing the rebalance ID (e.g., `'rebalance-12345'`), which can be used to track or cancel the rebalance operation.
379
+
380
+
**Important Notes:**
381
+
382
+
- Multiple goals can be specified in priority order. The system will attempt to achieve goals in the order specified.
383
+
- Rebalance operations run asynchronously in the background. Use the returned rebalance ID to monitor progress.
384
+
- The rebalance operation respects server tags set by `add_server_tag`. For example, servers marked with `PERMANENT_OFFLINE` will have their buckets migrated away.
385
+
386
+
**Example:**
387
+
388
+
```sql title="Flink SQL"
389
+
-- Use the Fluss catalog (replace 'fluss_catalog' with your catalog name if different)
390
+
USE fluss_catalog;
391
+
392
+
-- Trigger rebalance with replica distribution goal
393
+
CALL sys.rebalance('REPLICA_DISTRIBUTION');
394
+
395
+
-- Trigger rebalance with multiple goals in priority order
Query the progress and status of a rebalance operation. This procedure allows you to monitor ongoing or completed rebalance operations to track their progress and view detailed information about bucket movements.
402
+
403
+
**Syntax:**
404
+
405
+
```sql
406
+
-- List the most recent rebalance progress
407
+
CALL [catalog_name.]sys.list_rebalance()
408
+
409
+
-- List a specific rebalance progress by ID
410
+
CALL [catalog_name.]sys.list_rebalance(
411
+
rebalanceId =>'STRING'
412
+
)
413
+
```
414
+
415
+
**Parameters:**
416
+
417
+
-`rebalanceId` (optional): The rebalance ID to query. If omitted, returns the progress of the most recent rebalance operation. The rebalance ID is returned when calling the `rebalance` procedure.
418
+
419
+
**Returns:** An array of strings containing:
420
+
- Rebalance ID: The unique identifier of the rebalance operation
421
+
- Rebalance total status: The overall status of the rebalance. Possible values are:
422
+
-`NOT_STARTED`: The rebalance has been created but not yet started
423
+
-`REBALANCING`: The rebalance is currently in progress
424
+
-`COMPLETED`: The rebalance has successfully completed
425
+
-`FAILED`: The rebalance has failed
426
+
-`CANCELED`: The rebalance has been canceled
427
+
- Rebalance progress: The completion percentage (e.g., `75.5%`)
428
+
- Rebalance detail progress for bucket: Detailed progress information for each bucket being moved
429
+
430
+
If no rebalance is found, returns empty line.
431
+
432
+
**Example:**
433
+
434
+
```sql title="Flink SQL"
435
+
-- Use the Fluss catalog (replace 'fluss_catalog' with your catalog name if different)
436
+
USE fluss_catalog;
437
+
438
+
-- List the most recent rebalance progress
439
+
CALL sys.list_rebalance();
440
+
441
+
-- List a specific rebalance progress by ID
442
+
CALL sys.list_rebalance('rebalance-12345');
443
+
```
444
+
445
+
### cancel_rebalance
446
+
447
+
Cancel an ongoing rebalance operation. This procedure allows you to stop a rebalance that is in progress, which is useful when you need to halt bucket redistribution due to operational requirements or unexpected issues.
448
+
449
+
**Syntax:**
450
+
451
+
```sql
452
+
-- Cancel the most recent rebalance operation
453
+
CALL [catalog_name.]sys.cancel_rebalance()
454
+
455
+
-- Cancel a specific rebalance operation by ID
456
+
CALL [catalog_name.]sys.cancel_rebalance(
457
+
rebalanceId =>'STRING'
458
+
)
459
+
```
460
+
461
+
**Parameters:**
462
+
463
+
-`rebalanceId` (optional): The rebalance ID to cancel. If omitted, cancels the most recent rebalance operation. The rebalance ID is returned when calling the `rebalance` procedure.
464
+
465
+
**Returns:** An array with a single element `'success'` if the operation completes successfully.
466
+
467
+
**Important Notes:**
468
+
469
+
- Only rebalance operations in `NOT_STARTED` or `REBALANCING` status can be canceled.
470
+
- Canceling a rebalance will stop bucket movements, but already completed bucket migrations will not be rolled back.
471
+
- After cancellation, the rebalance status will change to `CANCELED`.
472
+
- You can verify the cancellation by calling `list_rebalance` to check the status.
473
+
474
+
**Example:**
475
+
476
+
```sql title="Flink SQL"
477
+
-- Use the Fluss catalog (replace 'fluss_catalog' with your catalog name if different)
0 commit comments