Skip to content

[Improvement-17908][Flink] Make -sae parameter configurable in UI with default off#17909

Open
chris-fast wants to merge 1 commit intoapache:devfrom
chris-fast:fix-17908-flink-sae-parameter
Open

[Improvement-17908][Flink] Make -sae parameter configurable in UI with default off#17909
chris-fast wants to merge 1 commit intoapache:devfrom
chris-fast:fix-17908-flink-sae-parameter

Conversation

@chris-fast
Copy link

@chris-fast chris-fast commented Jan 27, 2026

What is the purpose of the change

Fixes #17908

The -sae (--shutdownOnAttachedExit) parameter was causing Flink tasks in YARN Application mode to terminate unexpectedly. Based on reviewer feedback, this parameter is now configurable via the UI instead of being hardcoded based on deploy mode.

Root Cause

Previously, the -sae parameter was automatically added for all non-APPLICATION modes. This hardcoded behavior didn't provide users with the flexibility to control this parameter based on their specific use cases.

Solution

Made the -sae parameter fully configurable through the UI:

  • Added a new shutdownOnAttachedExit field to FlinkParameters
  • Changed from hardcoded deployMode check to explicit configuration
  • Added UI switch control in the Flink task form
  • Default value: false (disabled for safety and backward compatibility)

Brief changelog

  • Added shutdownOnAttachedExit field to FlinkParameters with enhanced JavaDoc
  • Modified FlinkArgsUtils.buildRunCommandLine() to use configuration-based logic
  • Added UI switch control in Flink task form (positioned after Yarn Queue field)
  • Added comprehensive test cases covering all scenarios (null, false, true, APPLICATION mode)
  • Added Chinese and English i18n translations

Verifying this change

  • Code compilation pass
  • Unit tests updated and passing
  • Follows the project's code style (spotless)
  • Backward compatibility maintained

Test Cases Coverage

New test cases added:

  1. testRunJarWithShutdownOnAttachedExitEnabled() - Explicitly enabled (true)
  2. testRunJarWithShutdownOnAttachedExitDisabled() - Explicitly disabled (false)
  3. testRunJarWithShutdownOnAttachedExitInApplicationMode() - APPLICATION mode with enabled

Existing tests updated:

  • All default behavior tests now expect no -sae parameter
  • Maintains test coverage for CLUSTER, LOCAL, and APPLICATION modes

Behavior Matrix

Scenario shutdownOnAttachedExit deployMode -sae Added?
New task (default) null ANY No (safe default)
Existing task null ANY No (backward compatible)
Explicitly disabled false ANY No
Explicitly enabled true CLUSTER Yes
Explicitly enabled true LOCAL Yes
Explicitly enabled true APPLICATION Yes (user's choice)

Backward Compatibility

✅ Fully backward compatible:

  • Uses Boolean type (three-state: null/false/true)
  • null value represents existing tasks (no -sae parameter)
  • false value represents explicitly disabled
  • true value represents explicitly enabled
  • All existing Flink tasks continue to work without modification

UI Changes

A new switch control "Shutdown On Attached Exit" has been added to the Flink task configuration form:

  • Located between "Yarn Queue" and "Main Arguments" fields
  • Default: disabled (false)
  • When enabled, adds -sae parameter to Flink command
  • Translation provided in both Chinese and English

Related issues

Fixes #17908

Comment on lines 270 to 273
// Note: -sae should NOT be used for APPLICATION mode, as it runs in detached mode on YARN
if (deployMode != FlinkDeployMode.APPLICATION) {
args.add(FlinkConstants.FLINK_SHUTDOWN_ON_ATTACHED_EXIT); // -sae
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make it configurable in UI and set it as off by default.

@SbloodyS SbloodyS changed the title Fix 17908 flink sae parameter [Fix-17908] Flink sae parameter issue Jan 27, 2026
@SbloodyS SbloodyS added first time contributor First-time contributor improvement make more easy to user or prompt friendly labels Jan 27, 2026
@SbloodyS SbloodyS changed the title [Fix-17908] Flink sae parameter issue [Improvement-17908] Flink sae parameter issue Jan 27, 2026
@SbloodyS SbloodyS changed the title [Improvement-17908] Flink sae parameter issue [Improvement-17908] Flink sae parameter improvement Jan 27, 2026
@SbloodyS SbloodyS added this to the 3.4.1 milestone Jan 27, 2026
@chris-fast
Copy link
Author

Hi @SbloodyS , thank you for the review feedback!

I understand the requirement to make the -sae parameter configurable in UI with default as off.

Current Situation

The current PR removes -sae for APPLICATION mode only:

if (deployMode != FlinkDeployMode.APPLICATION) {
    args.add(FlinkConstants.FLINK_SHUTDOWN_ON_ATTACHED_EXIT);
}

Proposed Solution

I'''d like to propose adding a UI-configurable option for the -sae parameter:

Implementation Plan

  1. Backend Changes:

    • Add Boolean shutdownOnAttachedExit field to FlinkParameters.java
    • Modify FlinkArgsUtils.java to use this config:
      if (Boolean.TRUE.equals(flinkParameters.getShutdownOnAttachedExit())) {
          args.add(FlinkConstants.FLINK_SHUTDOWN_ON_ATTACHED_EXIT);
      }
    • Default value: null (disabled, parameter not added)
  2. Frontend Changes:

    • Add a checkbox in Flink task configuration UI
    • Label: "Shutdown on Attached Exit" (with tooltip explaining)
    • Default: unchecked (off by default as requested)

Benefits

  • ✅ User can explicitly control -sae parameter
  • ✅ Backward compatible (null = no parameter added)
  • ✅ Default OFF as requested
  • ✅ Clear UI indication

Questions

Before proceeding with implementation, I'''d like to confirm:

  1. Default behavior: Should existing tasks (without this field) have -sae disabled by default?

    • My proposal: Yes, default to disabled (null/false)
  2. UI placement: Should the checkbox be shown for all deploy modes or only non-APPLICATION modes?

    • My proposal: Show for all modes, user has full control
  3. Alternative: Would you prefer a different approach?

Looking forward to your feedback! Thanks!

@SbloodyS
Copy link
Member

@chris-fast Excellent. Your description is very accurate and correct. Please modify this PR according to your description.

@github-actions github-actions bot added the UI ui and front end related label Jan 28, 2026
@chris-fast chris-fast changed the title [Improvement-17908] Flink sae parameter improvement [Improvement-17908][Flink] Make -sae parameter configurable in UI with default off Jan 28, 2026
@chris-fast
Copy link
Author

@SbloodyS I've updated the PR according to the plan, PTAL. Thanks!

Copy link
Member

@SbloodyS SbloodyS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should run pnpm run lint to format the frontend code. @chris-fast

@chris-fast
Copy link
Author

@SbloodyS

I've run pnpm run lint as you suggested. The linter auto-fixed dependencies-modal.tsx (missing .value for ref access) along with the targeted changes.

I've included this fix in the PR to ensure CI passes cleanly. This is a legitimate bug fix that prevents incorrect values from being emitted when closing the dependencies modal.

SbloodyS
SbloodyS previously approved these changes Jan 29, 2026
Copy link
Member

@SbloodyS SbloodyS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SbloodyS SbloodyS requested a review from ruanwenjun January 29, 2026 09:00
*
* @see FlinkArgsUtils#buildRunCommandLine
*/
private Boolean shutdownOnAttachedExit;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we don't support take-over flink job when failover, these change might cause the flink job duplicated run on YARN?
And, it's better to make the default value to TRUE, do not break compatibility, as this is essentially a Flink bug.

Copy link
Author

@chris-fast chris-fast Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I every agree with you on the compatibility pointkeeping the default as TRUE is definitely the safest move.

And I want to clarify a few things regarding the underlying mechanism, though:

1.It's not really a "Flink bug",-sae parameter was designed to prevent resource leaks (zombie clusters), not to handle duplicate submissions.

2.Relying on -sae=true to prevent "double runs" is actually pretty unreliable. If a worker hits a hard crash (like an OOM or power outage), the CLI dies instantly and never gets a chance to send the shutdown signal to the cluster. So, the job keeps running, and a retry will still cause a duplicate.

3.The better way to handle idempotency is via YARN Application Tags (e.g., using the ProcessInstanceId) and checking if that tag exists before submitting.

I think that "idempotency check" deserves to be a future optimization feature on its own. It’s probably better to keep it out of this current PR so we don't block the merge.

Thanks a lot for the feedback—I actually learned a ton digging into this! I’d be more than happy to help contribute code for that future optimization feature, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that "idempotency check" deserves to be a future optimization feature on its own. It’s probably better to keep it > out of this current PR so we don't block the merge.

+1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that "idempotency check" deserves to be a future optimization feature on its own. It’s probably better to keep it > out of this current PR so we don't block the merge.

+1

Totally agree. Thanks! I'll push the new code right now

Copy link
Member

@ruanwenjun ruanwenjun Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chris-fast What I’m trying to say is that we shouldn’t break compatibility here, otherwise will lead to more problems e.g failover. Regardless of how we deal with the failover issue in the future, using -sea is not a bad thing.

Adapting the upper layer just to work around a bug in a specific version of a lower-level component isn’t a sustainable approach. If different versions of the lower layer each have their own issues, does that mean the upper layer needs to keep adding special handling for all of them? That would make the system increasingly complex and fragile.

Also, I don’t quite understand the statement that “it’s not really a Flink bug.” From my perspective, the unexpected behavior originates from Flink’s side, so it’s hard to see why it wouldn’t be considered a bug there.

The most important thing is it hard to explain to users under what circumstances they should enable/disable shutdownOnAttachedExit. If I am the user, I will ask "If this parameter is turned off, will the process not exit? Under what circumstances would we want the process to stay alive instead of exiting".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with you on that. We can't let the system become too unwieldy to manage,and using -sea is not a bad thing in this scenario.

Copy link
Member

@ruanwenjun ruanwenjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use AI tool to generat PR.

@sonarqubecloud
Copy link

@chris-fast chris-fast force-pushed the fix-17908-flink-sae-parameter branch from e8c3eeb to 2fcaa46 Compare February 2, 2026 10:26
@chris-fast chris-fast closed this Feb 2, 2026
@ruanwenjun
Copy link
Member

ruanwenjun commented Feb 3, 2026

@chris-fast Hi, I’ve reopened this PR as the PR is still required.
The Flink plugin shouldn’t append default arguments like -sea internally. Plugins should stay neutral and avoid hard-coded runtime parameters.
A better approach would be to expose a configurable input in the UI, allowing users to provide such arguments explicitly when needed. We don’t need to add a separate input field for every possible argument.
Instead, we can provide a single generic input field that accepts a list of custom parameters like these.

@ruanwenjun ruanwenjun reopened this Feb 3, 2026
@chris-fast
Copy link
Author

@chris-fast Hi, I’ve reopened this PR as the PR is still required. The Flink plugin shouldn’t append default arguments like -sea internally. Plugins should stay neutral and avoid hard-coded runtime parameters. A better approach would be to expose a configurable input in the UI, allowing users to provide such arguments explicitly when needed. We don’t need to add a separate input field for every possible argument. Instead, we can provide a single generic input field that accepts a list of custom parameters like these.

If the -sae parameter is set in the input box configuration, shouldn't the original default behavior of adding -sae be removed? That's my understanding—there should be no -sae by default. We don't need to worry about backward compatibility for this setting.

…tral

The Flink plugin should not hardcode runtime parameters like -sae
(shutdown-on-attached-exit) internally. Plugins should stay neutral
and avoid hard-coded runtime parameters.

Users can now add any Flink CLI parameters (including -sae) through
the existing "others" input field in the UI when needed.

Changes:
- Remove hardcoded -sae parameter from FlinkArgsUtils
- Update test cases to reflect the removal
- Fix dependencies-modal.tsx lint error (missing .value for ref access)

Related: apache#17908
@chris-fast chris-fast force-pushed the fix-17908-flink-sae-parameter branch from 2fcaa46 to 5a54abe Compare February 4, 2026 07:34
@SbloodyS
Copy link
Member

SbloodyS commented Feb 4, 2026

If the -sae parameter is set in the input box configuration, shouldn't the original default behavior of adding -sae be removed? That's my understanding—there should be no -sae by default. We don't need to worry about backward compatibility for this setting.

I think so. PTAL @ruanwenjun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend first time contributor First-time contributor improvement make more easy to user or prompt friendly test UI ui and front end related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement] [Flink] sae parameter will cause the task to be killed

3 participants