Skip to content

[FLINK-39914][runtime] Fix flaky TaskDeploymentDescriptorFactoryTest#testHybridVertexFinish caused by async TDD creation#28398

Open
och5351 wants to merge 1 commit into
apache:masterfrom
och5351:feature/FLINK-39914
Open

[FLINK-39914][runtime] Fix flaky TaskDeploymentDescriptorFactoryTest#testHybridVertexFinish caused by async TDD creation#28398
och5351 wants to merge 1 commit into
apache:masterfrom
och5351:feature/FLINK-39914

Conversation

@och5351

@och5351 och5351 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

What is the purpose of the change

Execution.deploy() now creates TaskDeploymentDescriptor asynchronously, which means the IO executor thread calls jobMasterMainThreadExecutor.execute() to hand off the work.

The problem is that forMainThread() captures the calling thread at the time of its creation and asserts inside execute() that only that same thread can call it.

Since the IO executor thread calls execute() instead, the assertion fires and throws an AssertionError. This error then bubbles up through the CompletableFuture chain, flipping the producer vertex into FAILED state, which causes an IllegalStateException when the consumer vertex tries to read from it.

Brief change log

Updated buildExecutionGraph() in TaskDeploymentDescriptorFactoryTest to replace forMainThread() with forSingleThreadExecutor() and add proper async handling to accommodate the asynchronous TaskDeploymentDescriptor creation in Execution.deploy().

Verifying this change

image

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

…testHybridVertexFinish caused by async TDD creation

@RocMarshal RocMarshal left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @och5351 LGTM +1.

@RocMarshal RocMarshal requested a review from lihaosky June 11, 2026 17:27
@flinkbot

flinkbot commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants