Skip to content

Commit 49bf899

Browse files
authored
[Doc-17728][Master] Update load-balance doc (#17735)
1 parent 8b8c5fe commit 49bf899

File tree

2 files changed

+126
-50
lines changed

2 files changed

+126
-50
lines changed
Lines changed: 62 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,52 +1,89 @@
1-
# Load Balance
1+
# Load Balancing
22

3-
Load balancing refers to the reasonable allocation of server pressure through routing algorithms (usually in cluster environments) to achieve the maximum optimization of server performance.
3+
Load balancing distributes server pressure reasonably through routing algorithms (typically in cluster environments) to optimize server performance to the maximum extent.
44

55
## DolphinScheduler-Worker Load Balancing Algorithms
66

7-
DolphinScheduler-Master allocates tasks to workers, and by default provides three algorithms:
7+
DolphinScheduler-Master provides four load balancing algorithms for distributing tasks to workers:
88

9-
- Weighted random (random)
9+
- **Random** (RANDOM)
10+
- **Round Robin** (ROUND_ROBIN)
11+
- **Smooth Round Robin** (FIXED_WEIGHTED_ROUND_ROBIN)
12+
- **Dynamic Smooth Round Robin** (DYNAMIC_WEIGHTED_ROUND_ROBIN) - Default algorithm
1013

11-
- Smoothing polling (round-robin)
14+
## Load Balancing Configuration
1215

13-
- Linear load (lower weight)
16+
Configure the load balancing algorithm in the configuration file:
1417

15-
The default configuration is the linear load.
18+
Location: `master-server/conf/application.yaml`
1619

17-
As the routing sets on the client side, the master service, you can change master.host.selector in master.properties to configure the algorithm.
20+
```yaml
21+
worker-load-balancer-configuration-properties:
22+
# types: RANDOM, ROUND_ROBIN, FIXED_WEIGHTED_ROUND_ROBIN, DYNAMIC_WEIGHTED_ROUND_ROBIN
23+
type: DYNAMIC_WEIGHTED_ROUND_ROBIN
24+
```
1825
19-
e.g. master.host.selector=random (case-insensitive)
26+
## Worker Weight Configuration
2027
21-
## Worker Load Balancing Configuration
28+
### Smooth Round Robin Configuration (FIXED_WEIGHTED_ROUND_ROBIN)
2229
23-
The configuration file is worker.properties
30+
For the `FIXED_WEIGHTED_ROUND_ROBIN` algorithm, you can modify the fixed weight in each worker's configuration file:
2431

25-
### Weight
32+
Location: `worker-server/conf/application.yaml`
2633

27-
All the load algorithms above are weighted based on weights, which affect the routing outcome. You can set different weights for different machines by modifying the `worker.weight` value.
34+
```yaml
35+
worker:
36+
host-weight: 100 #default value is 100
37+
```
2838

29-
### Preheating
39+
### Dynamic Smooth Round Robin Configuration (DYNAMIC_WEIGHTED_ROUND_ROBIN)
3040

31-
Consider JIT optimization, worker runs at low power for a period of time after startup, so that it can gradually reach its optimal state, a process we call preheating. If you are interested, you can read some articles about JIT.
41+
When using the `DYNAMIC_WEIGHTED_ROUND_ROBIN` algorithm, you can configure the weights for various metrics:
3242

33-
So the worker gradually reaches its maximum weight with time after starts up ( by default ten minutes, there is no configuration about the pre-heating duration, it's recommend to submit a PR if have needs to change the duration).
43+
```yaml
44+
master:
45+
worker-load-balancer-configuration-properties:
46+
type: DYNAMIC_WEIGHTED_ROUND_ROBIN
47+
# Dynamic weight configuration, only used for DYNAMIC_WEIGHTED_ROUND_ROBIN algorithm
48+
# The sum of memory-usage, cpu-usage, task-thread-pool-usage weights must be 100
49+
dynamic-weight-config-properties:
50+
memory-usage-weight: 30 # Memory usage weight
51+
cpu-usage-weight: 30 # CPU usage weight
52+
task-thread-pool-usage-weight: 40 # Task thread pool usage weight
53+
```
3454

35-
## Load Balancing Algorithm in Details
55+
## Load Balancing Algorithm Details
3656

37-
### Random (Weighted)
57+
### Random (RANDOM)
3858

39-
This algorithm is relatively simple, select a worker by random (the weight affects its weighting).
59+
Randomly selects one available worker node to execute tasks.
4060

41-
### Smoothed Polling (Weighted)
61+
### Round Robin (ROUND_ROBIN)
4262

43-
An obvious drawback of the weighted polling algorithm, which is under special weights circumstance, weighted polling scheduling generates an imbalanced sequence of instances, and this unsmooth load may cause some instances to experience transient high loads, leading to a risk of system crash. To address this scheduling flaw, we provide a smooth weighted polling algorithm.
63+
Selects worker nodes in a fixed order to ensure each worker receives tasks evenly.
4464

45-
Each worker has two weights parameters, weight (which remains constant after warm-up is complete) and current_weight (which changes dynamically). For every route, calculate the current_weight + weight and is iterated over all the workers, the weight of all the workers sum up and count as total_weight, then the worker with the largest current_weight is selected as the worker for this task. By meantime, set worker's current_weight-total_weight.
65+
### Smooth Round Robin (FIXED_WEIGHTED_ROUND_ROBIN)
4666

47-
### Linear Weighting (Default Algorithm)
67+
Each worker has two weights: weight (remains constant after warm-up) and current_weight (dynamically changes). During each routing, all workers are traversed, and their current_weight is increased by their weight. The total weight of all workers is accumulated as total_weight. The worker with the highest current_weight is selected to execute the task, and then that worker's current_weight is decreased by total_weight.
4868

49-
This algorithm reports its own load information to the registry at regular intervals. We mainly judge by CPU usage, memory usage and worker slot usage.
69+
- Example: For instance, with 3 workers (A, B, C) having weights of 1, 2, and 3 respectively
70+
- Worker selection order will be: C B C A B C C B C A B C C B C A B C C B C A B C C B C A B C ... (In this 30-round scheduling example, the number of tasks allocated to each worker is: C:15, B:10, A:5, exactly matching the weight ratio)
5071

51-
If either of these is lower than the configured item, then this worker will not participate in the load. (no traffic will be allocated)
72+
### Dynamic Smooth Round Robin (DYNAMIC_WEIGHTED_ROUND_ROBIN) - Default Algorithm
5273

74+
This algorithm reports its own load information to the registry at regular intervals. We primarily evaluate based on CPU usage, memory usage, and worker thread pool usage, with specific weight configurations as follows:
75+
- **Memory Usage** (Default weight: 30%)
76+
- **CPU Usage** (Default weight: 30%)
77+
- **Task Thread Pool Usage** (Default weight: 40%)
78+
79+
**Weight Calculation Principle:**
80+
The dynamic weight of each worker is calculated using the following formula:
81+
82+
```
83+
Weight = 100 - (CPU Weight × CPU Usage + Memory Weight × Memory Usage + Thread Pool Weight × Thread Pool Usage) ÷ 3
84+
```
85+
86+
Therefore, when a worker's load is lower, its weight will be higher, and the system will prioritize selecting worker nodes with lower loads to execute tasks.
87+
88+
In the final worker node selection process, the workflow is consistent with smooth round robin, with the only difference being that in this algorithm, worker weights change dynamically.
89+
Through this dynamic smooth round robin algorithm, DolphinScheduler can intelligently distribute tasks to workers with the lowest loads, achieving true dynamic load balancing.
Lines changed: 64 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,52 +1,91 @@
1-
### 负载均衡
1+
# 负载均衡
22

33
负载均衡即通过路由算法(通常是集群环境),合理的分摊服务器压力,达到服务器性能的最大优化。
44

5-
### DolphinScheduler-Worker 负载均衡算法
5+
## DolphinScheduler-Worker 负载均衡算法
66

7-
DolphinScheduler-Master 分配任务至 worker,默认提供了三种算法:
7+
DolphinScheduler-Master 分配任务至 worker,提供了四种负载均衡算法:
88

9-
加权随机(random)
9+
- **随机** (RANDOM)
10+
- **轮询** (ROUND_ROBIN)
11+
- **平滑轮询** (FIXED_WEIGHTED_ROUND_ROBIN)
12+
- **动态平滑轮询** (DYNAMIC_WEIGHTED_ROUND_ROBIN) - 默认算法
1013

11-
平滑轮询(roundrobin)
14+
## 负载均衡算法配置
1215

13-
线性负载(lowerweight)
16+
在配置文件中配置负载均衡算法:
1417

15-
默认配置为线性加权负载。
18+
位置:`master-server/conf/application.yaml`
1619

17-
由于路由是在客户端做的,即 master 服务,因此你可以更改 master.properties 中的 master.host.selector 来配置你所想要的算法。
20+
```yaml
21+
worker-load-balancer-configuration-properties:
22+
# 负载均衡算法类型:RANDOM, ROUND_ROBIN, FIXED_WEIGHTED_ROUND_ROBIN, DYNAMIC_WEIGHTED_ROUND_ROBIN
23+
type: DYNAMIC_WEIGHTED_ROUND_ROBIN
24+
```
1825
19-
eg:master.host.selector=random(不区分大小写)
26+
## Worker 权重配置
2027
21-
### Worker 负载均衡配置
28+
### 平滑轮询权重配置 (FIXED_WEIGHTED_ROUND_ROBIN)
2229
23-
配置文件 worker.properties
30+
对于 `FIXED_WEIGHTED_ROUND_ROBIN` 算法,可以在每个 worker 的配置文件中修改固定权重:
2431

25-
#### 权重
32+
位置:`worker-server/conf/application.yaml`
2633

27-
上述所有的负载算法都是基于权重来进行加权分配的,权重影响分流结果。你可以在 修改 worker.weight 的值来给不同的机器设置不同的权重。
34+
```yaml
35+
worker:
36+
host-weight: 100 #默认值为 100
37+
```
2838

29-
#### 预热
39+
### 动态平滑轮询权重配置 (DYNAMIC_WEIGHTED_ROUND_ROBIN)
3040

31-
考虑到 JIT 优化,我们会让 worker 在启动后低功率的运行一段时间,使其逐渐达到最佳状态,这段过程我们称之为预热。感兴趣的同学可以去阅读 JIT 相关的文章。
41+
当使用 `DYNAMIC_WEIGHTED_ROUND_ROBIN` 算法时,可以配置各项指标的权重:
3242

33-
因此 worker 在启动后,他的权重会随着时间逐渐达到最大(默认十分钟,我们没有提供配置项,如果需要,你可以修改并提交相关的 PR)。
43+
位置:`master-server/conf/application.yaml`
3444

35-
### 负载均衡算法细述
45+
```yaml
46+
master:
47+
worker-load-balancer-configuration-properties:
48+
type: DYNAMIC_WEIGHTED_ROUND_ROBIN
49+
# 动态权重配置,仅用于 DYNAMIC_WEIGHTED_ROUND_ROBIN 算法
50+
# memory-usage、cpu-usage、task-thread-pool-usage 的权重总和必须为 100
51+
dynamic-weight-config-properties:
52+
memory-usage-weight: 30 # 内存使用率权重
53+
cpu-usage-weight: 30 # CPU 使用率权重
54+
task-thread-pool-usage-weight: 40 # 任务线程池使用率权重
55+
```
3656

37-
#### 随机(加权)
57+
## 负载均衡算法详解
3858

39-
该算法比较简单,即在符合的 worker 中随机选取一台(权重会影响他的比重)。
59+
### 随机 (RANDOM)
4060

41-
#### 平滑轮询(加权)
61+
在可用的 worker 节点中随机选择一台执行任务。
4262

43-
加权轮询算法一个明显的缺陷。即在某些特殊的权重下,加权轮询调度会生成不均匀的实例序列,这种不平滑的负载可能会使某些实例出现瞬时高负载的现象,导致系统存在宕机的风险。为了解决这个调度缺陷,我们提供了平滑加权轮询算法。
63+
### 轮询 (ROUND_ROBIN)
4464

45-
每台 worker 都有两个权重,即 weight(预热完成后保持不变),current_weight(动态变化),每次路由。都会遍历所有的 worker,使其 current_weight+weight,同时累加所有 worker 的 weight,计为 total_weight,然后挑选 current_weight 最大的作为本次执行任务的 worker,与此同时,将这台 worker 的 current_weight-total_weight
65+
按照固定的顺序依次选择 worker 节点执行任务,确保每个 worker 都能均匀分配到任务
4666

47-
#### 线性加权(默认算法)
67+
### 平滑轮询 (FIXED_WEIGHTED_ROUND_ROBIN)
4868

49-
该算法每隔一段时间会向注册中心上报自己的负载信息。我们主要根据CPU使用率、内存使用率以及 worker slot 使用情况来进行判断
69+
每台 worker 都有两个权重,即 weight(预热完成后保持不变),
70+
current_weight(动态变化),每次路由。都会遍历所有的 worker,使其 current_weight+weight,同时累加所有 worker 的 weight,计为 total_weight,然后挑选 current_weight 最大的作为本次执行任务的 worker,与此同时,将这台 worker 的 current_weight-total_weight。
71+
- 示例:例如有 3 个 worker (A, B, C) 的权重分别为 1、2、3
72+
- worker 选择顺序将为:C B C A B C C B C A B C C B C A B C C B C A B C C B C A B C ... (在上述30轮的调度例子中,每个 worker 分配任务数量:C:15, B:10, A:5, 恰好与权重比例相匹配)
5073

51-
如果任何一个低于配置项,那么这台 worker 将不参与负载。(即不分配流量)
74+
### 动态平滑轮询 (DYNAMIC_WEIGHTED_ROUND_ROBIN) - 默认算法
5275

76+
该算法每隔一段时间 worker 会向注册中心上报自己的负载信息。我们主要根据CPU使用率、
77+
内存使用率以及 worker 线程池使用率使用情况来进行判断,具体权重配置如下:
78+
- **内存使用率** (默认权重 30%)
79+
- **CPU 使用率** (默认权重 30%)
80+
- **任务线程池使用率** (默认权重 40%)
81+
82+
每个 worker 的动态权重通过以下公式计算:
83+
84+
```
85+
权重 = 100 - (CPU权重 × CPU使用率 + 内存权重 × 内存使用率 + 线程池权重 × 线程池使用率) ÷ 3
86+
```
87+
88+
所以当 worker 的负载越低权重就会越高,系统会优先选择负载较低的 worker 节点执行任务。
89+
90+
在最终选择 worker 节点时,流程与平滑轮询相同,唯一的区别是,该算法下 worker 的权重会动态变化。
91+
通过这种动态平滑轮询算法,DolphinScheduler 能够智能地将任务分配到负载最低的 worker 上,实现真正的动态负载均衡。

0 commit comments

Comments
 (0)