Skip to content

Commit 6131e7e

Browse files
committed
Support configurable maximum runtime for workflow/task instance
1 parent a48474a commit 6131e7e

File tree

13 files changed

+228
-48
lines changed

13 files changed

+228
-48
lines changed

deploy/kubernetes/dolphinscheduler/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,8 @@ Please refer to the [Quick Start in Kubernetes](../../../docs/docs/en/guide/inst
215215
| master.env.MASTER_SERVER_LOAD_PROTECTION_MAX_JVM_CPU_USAGE_PERCENTAGE_THRESHOLDS | float | `0.7` | Master max jvm cpu usage, when the master's jvm cpu usage is smaller then this value, master server can execute workflow. |
216216
| master.env.MASTER_SERVER_LOAD_PROTECTION_MAX_SYSTEM_CPU_USAGE_PERCENTAGE_THRESHOLDS | float | `0.7` | Master max system cpu usage, when the master's system cpu usage is smaller then this value, master server can execute workflow. |
217217
| master.env.MASTER_SERVER_LOAD_PROTECTION_MAX_SYSTEM_MEMORY_USAGE_PERCENTAGE_THRESHOLDS | float | `0.7` | Master max System memory usage , when the master's system memory usage is smaller then this value, master server can execute workflow. |
218+
| master.env.MASTER_SERVER_LOAD_PROTECTION_MAX_TASK_INSTANCE_RUNTIME | string | `"0d"` | Maximum allowed running time for a task instance. If the running duration exceeds this value, the instance will be killed. The default value of 0d indicates no limit. |
219+
| master.env.MASTER_SERVER_LOAD_PROTECTION_MAX_WORKFLOW_INSTANCE_RUNTIME | string | `"0d"` | Maximum allowed running time for a workflow instance. If the running duration exceeds this value, the instance will be killed. The default value of 0d indicates no limit. |
218220
| master.env.MASTER_STATE_WHEEL_INTERVAL | string | `"5s"` | master state wheel interval, the unit is second |
219221
| master.env.MASTER_TASK_COMMIT_INTERVAL | string | `"1s"` | master commit task interval, the unit is second |
220222
| master.env.MASTER_TASK_COMMIT_RETRYTIMES | string | `"5"` | Master commit task retry times |

deploy/kubernetes/dolphinscheduler/values.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -565,6 +565,10 @@ master:
565565
MASTER_SERVER_LOAD_PROTECTION_MAX_SYSTEM_MEMORY_USAGE_PERCENTAGE_THRESHOLDS: 0.7
566566
# -- Master max disk usage , when the master's disk usage is smaller then this value, master server can execute workflow.
567567
MASTER_SERVER_LOAD_PROTECTION_MAX_DISK_USAGE_PERCENTAGE_THRESHOLDS: 0.7
568+
# -- Maximum allowed running time for a workflow instance. If the running duration exceeds this value, the instance will be killed. The default value of 0d indicates no limit.
569+
MASTER_SERVER_LOAD_PROTECTION_MAX_WORKFLOW_INSTANCE_RUNTIME: 0d
570+
# -- Maximum allowed running time for a task instance. If the running duration exceeds this value, the instance will be killed. The default value of 0d indicates no limit.
571+
MASTER_SERVER_LOAD_PROTECTION_MAX_TASK_INSTANCE_RUNTIME: 0d
568572
# -- Master failover interval, the unit is minute
569573
MASTER_FAILOVER_INTERVAL: "10m"
570574
# -- Master kill application when handle failover

docs/docs/en/architecture/configuration.md

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -275,24 +275,26 @@ Location: `api-server/conf/application.yaml`
275275

276276
Location: `master-server/conf/application.yaml`
277277

278-
| Parameters | Default value | Description |
279-
|-----------------------------------------------------------------------------|------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|
280-
| master.listen-port | 5678 | master listen port |
281-
| master.logic-task-config.task-executor-thread-count | 2 * CPU +1 | The thread size used to execute logic task |
282-
| master.worker-load-balancer-configuration-properties.type | DYNAMIC_WEIGHTED_ROUND_ROBIN | Master will use the worker's cpu/memory/threadPool usage to calculate the worker load, the lower load will have more change to be dispatched task |
283-
| master.max-heartbeat-interval | 10s | master max heartbeat interval |
284-
| master.server-load-protection.enabled | true | If set true, will open master overload protection |
285-
| master.server-load-protection.max-system-cpu-usage-percentage-thresholds | 0.8 | Master max system cpu usage, when the master's system cpu usage is smaller then this value, master server can execute workflow. |
286-
| master.server-load-protection.max-jvm-cpu-usage-percentage-thresholds | 0.8 | Master max JVM cpu usage, when the master's jvm cpu usage is smaller then this value, master server can execute workflow. |
287-
| master.server-load-protection.max-system-memory-usage-percentage-thresholds | 0.8 | Master max system memory usage , when the master's system memory usage is smaller then this value, master server can execute workflow. |
288-
| master.server-load-protection.max-disk-usage-percentage-thresholds | 0.8 | Master max disk usage , when the master's disk usage is smaller then this value, master server can execute workflow. |
289-
| master.server-load-protection.max-concurrent-workflow-instances | 2147483647 | Master max concurrent workflow instances, when the master's workflow instance count reaches or exceeds this value, master server will be marked as busy. |
290-
| master.worker-group-refresh-interval | 10s | The interval to refresh worker group from db to memory |
291-
| master.command-fetch-strategy.type | ID_SLOT_BASED | The command fetch strategy, only support `ID_SLOT_BASED` |
292-
| master.command-fetch-strategy.config.id-step | 1 | The id auto incremental step of t_ds_command in db |
293-
| master.command-fetch-strategy.config.fetch-size | 10 | The number of commands fetched by master |
294-
| master.task-dispatch-policy.dispatch-timeout-enabled | false | Indicates whether the dispatch timeout checking mechanism is enabled |
295-
| master.task-dispatch-policy.max-task-dispatch-duration | 1h | The maximum allowed duration a task may wait in the dispatch queue before being assigned to a worker |
278+
| Parameters | Default value | Description |
279+
|-----------------------------------------------------------------------------|------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
280+
| master.listen-port | 5678 | master listen port |
281+
| master.logic-task-config.task-executor-thread-count | 2 * CPU +1 | The thread size used to execute logic task |
282+
| master.worker-load-balancer-configuration-properties.type | DYNAMIC_WEIGHTED_ROUND_ROBIN | Master will use the worker's cpu/memory/threadPool usage to calculate the worker load, the lower load will have more change to be dispatched task |
283+
| master.max-heartbeat-interval | 10s | master max heartbeat interval |
284+
| master.server-load-protection.enabled | true | If set true, will open master overload protection |
285+
| master.server-load-protection.max-system-cpu-usage-percentage-thresholds | 0.8 | Master max system cpu usage, when the master's system cpu usage is smaller then this value, master server can execute workflow. |
286+
| master.server-load-protection.max-jvm-cpu-usage-percentage-thresholds | 0.8 | Master max JVM cpu usage, when the master's jvm cpu usage is smaller then this value, master server can execute workflow. |
287+
| master.server-load-protection.max-system-memory-usage-percentage-thresholds | 0.8 | Master max system memory usage , when the master's system memory usage is smaller then this value, master server can execute workflow. |
288+
| master.server-load-protection.max-disk-usage-percentage-thresholds | 0.8 | Master max disk usage , when the master's disk usage is smaller then this value, master server can execute workflow. |
289+
| master.server-load-protection.max-concurrent-workflow-instances | 2147483647 | Master max concurrent workflow instances, when the master's workflow instance count reaches or exceeds this value, master server will be marked as busy. |
290+
| master.server-load-protection.max-workflow-instance-runtime | 0m | Maximum allowed running time for a workflow instance. If the running duration exceeds this value, the instance will be kill. The default value of 0d indicates no limit, the min value is 1m. |
291+
| master.server-load-protection.max-task-instance-runtime | 0m | Maximum allowed running time for a task instance. If the running duration exceeds this value, the instance will be kill. The default value of 0d indicates no limit, the min value is 1m. |
292+
| master.worker-group-refresh-interval | 10s | The interval to refresh worker group from db to memory |
293+
| master.command-fetch-strategy.type | ID_SLOT_BASED | The command fetch strategy, only support `ID_SLOT_BASED` |
294+
| master.command-fetch-strategy.config.id-step | 1 | The id auto incremental step of t_ds_command in db |
295+
| master.command-fetch-strategy.config.fetch-size | 10 | The number of commands fetched by master |
296+
| master.task-dispatch-policy.dispatch-timeout-enabled | false | Indicates whether the dispatch timeout checking mechanism is enabled |
297+
| master.task-dispatch-policy.max-task-dispatch-duration | 1h | The maximum allowed duration a task may wait in the dispatch queue before being assigned to a worker |
296298

297299
### Worker Server related configuration
298300

0 commit comments

Comments
 (0)