Skip to content

Conversation

@vsantwana
Copy link
Contributor

What is the purpose of the change

(For example: This pull request adds a new feature to periodically create and maintain savepoints through the FlinkDeployment custom resource.)

With the merge of this PR, we introduced a feature to emit kubernetes events for all the flink exception. At the base of this is a REST API from flink JM. What went wrong is that the code to get the REST endpoint for JM was based on the job name,

        String host =
                ObjectUtils.firstNonNull(
                        operatorConfig.getFlinkServiceHostOverride(),
                        ExternalServiceDecorator.getNamespacedExternalServiceName(
                                resource.getMetadata().getName(),
                                resource.getMetadata().getNamespace()));

While this works for applications but not for session job, which started throwing an UnknownHostException like

Name or service not known","message":"java.net.UnknownHostException: parquetizer-xyz-rest.parquetizers: Name or service not known","name":"java.util.concurrent.ExecutionException","cause":

Brief change log

  • Use RestClusterClient to make a call to JM in both application mode and session mode.

Verifying this change

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (100MB)
  • Extended integration test for recovery after master (JobManager) failure
  • Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no) no
  • The public API, i.e., is any changes to the CustomResourceDescriptors: (yes / no). no
  • Core observer or reconciler logic that is regularly executed: (yes / no) yes

Documentation

  • Does this pull request introduce a new feature? (yes / no) no
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@gyfora gyfora merged commit 9c7795a into apache:main Jul 2, 2025
221 of 241 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants