[Fix-17758][Master] Mark task as failed if TaskExecutionContext initialization fails#17821
[Fix-17758][Master] Mark task as failed if TaskExecutionContext initialization fails#17821SbloodyS merged 7 commits intoapache:devfrom
Conversation
…l when try to dispatch task
|
The second version of the code has been verified to work in our actual test environment. The specific logs are as follows:
|
There was a problem hiding this comment.
Pull request overview
This PR fixes issue #17758 by implementing proper failure handling when TaskExecutionContext initialization fails during task dispatch. Previously, when initialization failed, tasks would remain in an incomplete state rather than being marked as failed.
Key Changes:
- Introduced TaskFatalLifecycleEvent to handle catastrophic task failures
- Added exception handling in TaskSubmittedStateAction to catch initialization failures
- Implemented automatic task failure marking when initialization fails, with support for retries and condition task workflows
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| TaskExecutionContextCreateException.java | Changed exception hierarchy from MasterException to RuntimeException |
| ExceptionUtils.java | Added utility method to check for TaskExecutionContextCreateException |
| TaskSubmittedStateAction.java | Added try-catch block around task context initialization to throw TaskExecutionContextCreateException |
| TaskLifecycleEventType.java | Added new FATAL event type for catastrophic failures |
| TaskFatalLifecycleEvent.java | New event class representing fatal task failures with end time tracking |
| TaskFatalLifecycleEventHandler.java | New handler to process fatal lifecycle events |
| ITaskStateAction.java | Added onFatalEvent method interface for handling fatal events |
| AbstractTaskStateAction.java | Implemented onFatalEvent with retry logic, condition task handling, and failure chain marking |
| WorkflowEventBusFireWorker.java | Added logic to publish TaskFatalLifecycleEvent when TaskExecutionContextCreateException occurs |
| WorkflowStartTestCase.java | Added three test methods to verify fatal task handling scenarios |
| workflow_with_one_fake_task_fatal.yaml | Test configuration for single fatal task scenario |
| workflow_with_one_condition_task_with_one_fake_predecessor_fatal.yaml | Test configuration for condition task with fatal predecessor |
| workflow_with_one_forbidden_condition_task_with_one_fake_predecessor_fatal.yaml | Test configuration for forbidden condition task with fatal predecessor |
| workflow_with_one_forbidden_condition_task_with_one_fake_predecessor_failed.yaml | Updated name and description for clarity |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...c/main/java/org/apache/dolphinscheduler/server/master/engine/WorkflowEventBusFireWorker.java
Show resolved
Hide resolved
...apache/dolphinscheduler/server/master/engine/task/statemachine/TaskSubmittedStateAction.java
Outdated
Show resolved
Hide resolved
...ces/it/start/workflow_with_one_forbidden_condition_task_with_one_fake_predecessor_fatal.yaml
Outdated
Show resolved
Hide resolved
...est/resources/it/start/workflow_with_one_condition_task_with_one_fake_predecessor_fatal.yaml
Outdated
Show resolved
Hide resolved
...ache/dolphinscheduler/server/master/engine/task/lifecycle/event/TaskFatalLifecycleEvent.java
Show resolved
Hide resolved
...c/main/java/org/apache/dolphinscheduler/server/master/engine/WorkflowEventBusFireWorker.java
Outdated
Show resolved
Hide resolved
| publishWorkflowInstanceTopologyLogicalTransitionEvent(taskExecutionRunnable); | ||
| return; | ||
| } | ||
| taskExecutionRunnable.getWorkflowExecutionGraph().markTaskExecutionRunnableChainFailure(taskExecutionRunnable); |
There was a problem hiding this comment.
Need to deal with the workflow failure strategy
...apache/dolphinscheduler/server/master/engine/task/statemachine/TaskSubmittedStateAction.java
Show resolved
Hide resolved
| } catch (DataAccessResourceFailureException ex) { | ||
| log.error("Database/resource failure during task context initialization, taskName: {}", | ||
| taskInstance.getName(), ex); | ||
| throw ex; | ||
| } catch (Exception ex) { | ||
| log.error("Failed to initialize task execution context, taskName: {}", taskInstance.getName(), ex); |
There was a problem hiding this comment.
| } catch (DataAccessResourceFailureException ex) { | |
| log.error("Database/resource failure during task context initialization, taskName: {}", | |
| taskInstance.getName(), ex); | |
| throw ex; | |
| } catch (Exception ex) { | |
| log.error("Failed to initialize task execution context, taskName: {}", taskInstance.getName(), ex); | |
| catch (Exception ex) { | |
| if(ExceptionUtils.isDatabaseConnectedFailedException(ex)) { | |
| throw ex; | |
| } | |
| log.error("Failed to initialize task execution context, taskName: {}", taskInstance.getName(), ex); |
There was a problem hiding this comment.
catch (Exception ex) {
if(ExceptionUtils.isDatabaseConnectedFailedException(ex)) {
throw ex;
}
log.error("Failed to initialize task execution context, taskName: {}", taskInstance.getName(), ex);
an excellent suggestion!
|
|
@SbloodyS Whenever you have time, I’d be grateful for your review. Thanks so much! |



Purpose of the pull request
close #17758
Brief change log
publish TaskFatalLifecycleEvent if initializeTaskExecutionContext fail when try to dispatch task
Verify this pull request
This pull request is already covered by existing tests.
First, add it test case
Second, We have already verified and tested this in our actual production environment.
Pull Request Notice
Pull Request Notice
If your pull request contains incompatible change, you should also add it to
docs/docs/en/guide/upgrade/incompatible.md