[Improvement-17908][Flink] Make -sae parameter configurable in UI with default off#17909
[Improvement-17908][Flink] Make -sae parameter configurable in UI with default off#17909chris-fast wants to merge 1 commit intoapache:devfrom
Conversation
| // Note: -sae should NOT be used for APPLICATION mode, as it runs in detached mode on YARN | ||
| if (deployMode != FlinkDeployMode.APPLICATION) { | ||
| args.add(FlinkConstants.FLINK_SHUTDOWN_ON_ATTACHED_EXIT); // -sae | ||
| } |
There was a problem hiding this comment.
We should make it configurable in UI and set it as off by default.
|
Hi @SbloodyS , thank you for the review feedback! I understand the requirement to make the Current SituationThe current PR removes if (deployMode != FlinkDeployMode.APPLICATION) {
args.add(FlinkConstants.FLINK_SHUTDOWN_ON_ATTACHED_EXIT);
}Proposed SolutionI'''d like to propose adding a UI-configurable option for the Implementation Plan
Benefits
QuestionsBefore proceeding with implementation, I'''d like to confirm:
Looking forward to your feedback! Thanks! |
|
@chris-fast Excellent. Your description is very accurate and correct. Please modify this PR according to your description. |
|
@SbloodyS I've updated the PR according to the plan, PTAL. Thanks! |
dolphinscheduler-ui/src/views/projects/task/components/node/fields/use-flink.ts
Outdated
Show resolved
Hide resolved
...r-task-flink/src/main/java/org/apache/dolphinscheduler/plugin/task/flink/FlinkArgsUtils.java
Outdated
Show resolved
Hide resolved
SbloodyS
left a comment
There was a problem hiding this comment.
You should run pnpm run lint to format the frontend code. @chris-fast
|
I've run I've included this fix in the PR to ensure CI passes cleanly. This is a legitimate bug fix that prevents incorrect values from being emitted when closing the dependencies modal. |
| * | ||
| * @see FlinkArgsUtils#buildRunCommandLine | ||
| */ | ||
| private Boolean shutdownOnAttachedExit; |
There was a problem hiding this comment.
Right now we don't support take-over flink job when failover, these change might cause the flink job duplicated run on YARN?
And, it's better to make the default value to TRUE, do not break compatibility, as this is essentially a Flink bug.
There was a problem hiding this comment.
I every agree with you on the compatibility point,keeping the default as TRUE is definitely the safest move.
And I want to clarify a few things regarding the underlying mechanism, though:
1.It's not really a "Flink bug",-sae parameter was designed to prevent resource leaks (zombie clusters), not to handle duplicate submissions.
2.Relying on -sae=true to prevent "double runs" is actually pretty unreliable. If a worker hits a hard crash (like an OOM or power outage), the CLI dies instantly and never gets a chance to send the shutdown signal to the cluster. So, the job keeps running, and a retry will still cause a duplicate.
3.The better way to handle idempotency is via YARN Application Tags (e.g., using the ProcessInstanceId) and checking if that tag exists before submitting.
I think that "idempotency check" deserves to be a future optimization feature on its own. It’s probably better to keep it out of this current PR so we don't block the merge.
Thanks a lot for the feedback—I actually learned a ton digging into this! I’d be more than happy to help contribute code for that future optimization feature, too.
There was a problem hiding this comment.
I think that "idempotency check" deserves to be a future optimization feature on its own. It’s probably better to keep it > out of this current PR so we don't block the merge.
+1
There was a problem hiding this comment.
I think that "idempotency check" deserves to be a future optimization feature on its own. It’s probably better to keep it > out of this current PR so we don't block the merge.
+1
Totally agree. Thanks! I'll push the new code right now
There was a problem hiding this comment.
@chris-fast What I’m trying to say is that we shouldn’t break compatibility here, otherwise will lead to more problems e.g failover. Regardless of how we deal with the failover issue in the future, using -sea is not a bad thing.
Adapting the upper layer just to work around a bug in a specific version of a lower-level component isn’t a sustainable approach. If different versions of the lower layer each have their own issues, does that mean the upper layer needs to keep adding special handling for all of them? That would make the system increasingly complex and fragile.
Also, I don’t quite understand the statement that “it’s not really a Flink bug.” From my perspective, the unexpected behavior originates from Flink’s side, so it’s hard to see why it wouldn’t be considered a bug there.
The most important thing is it hard to explain to users under what circumstances they should enable/disable shutdownOnAttachedExit. If I am the user, I will ask "If this parameter is turned off, will the process not exit? Under what circumstances would we want the process to stay alive instead of exiting".
There was a problem hiding this comment.
I'm with you on that. We can't let the system become too unwieldy to manage,and using -sea is not a bad thing in this scenario.
|
e8c3eeb to
2fcaa46
Compare
|
@chris-fast Hi, I’ve reopened this PR as the PR is still required. |
If the -sae parameter is set in the input box configuration, shouldn't the original default behavior of adding -sae be removed? That's my understanding—there should be no -sae by default. We don't need to worry about backward compatibility for this setting. |
…tral The Flink plugin should not hardcode runtime parameters like -sae (shutdown-on-attached-exit) internally. Plugins should stay neutral and avoid hard-coded runtime parameters. Users can now add any Flink CLI parameters (including -sae) through the existing "others" input field in the UI when needed. Changes: - Remove hardcoded -sae parameter from FlinkArgsUtils - Update test cases to reflect the removal - Fix dependencies-modal.tsx lint error (missing .value for ref access) Related: apache#17908
2fcaa46 to
5a54abe
Compare
I think so. PTAL @ruanwenjun |



What is the purpose of the change
Fixes #17908
The
-sae(--shutdownOnAttachedExit) parameter was causing Flink tasks in YARN Application mode to terminate unexpectedly. Based on reviewer feedback, this parameter is now configurable via the UI instead of being hardcoded based on deploy mode.Root Cause
Previously, the
-saeparameter was automatically added for all non-APPLICATION modes. This hardcoded behavior didn't provide users with the flexibility to control this parameter based on their specific use cases.Solution
Made the
-saeparameter fully configurable through the UI:shutdownOnAttachedExitfield toFlinkParametersBrief changelog
shutdownOnAttachedExitfield toFlinkParameterswith enhanced JavaDocFlinkArgsUtils.buildRunCommandLine()to use configuration-based logicVerifying this change
Test Cases Coverage
New test cases added:
testRunJarWithShutdownOnAttachedExitEnabled()- Explicitly enabled (true)testRunJarWithShutdownOnAttachedExitDisabled()- Explicitly disabled (false)testRunJarWithShutdownOnAttachedExitInApplicationMode()- APPLICATION mode with enabledExisting tests updated:
-saeparameterBehavior Matrix
-saeAdded?Backward Compatibility
✅ Fully backward compatible:
Booleantype (three-state: null/false/true)nullvalue represents existing tasks (no -sae parameter)falsevalue represents explicitly disabledtruevalue represents explicitly enabledUI Changes
A new switch control "Shutdown On Attached Exit" has been added to the Flink task configuration form:
-saeparameter to Flink commandRelated issues
Fixes #17908