[FLINK-36581][cli] Allow passing Flink configuration by yaml job file #3918

MOBIN-F · 2025-02-12T03:04:02Z

At present, FlinkCDC only supports reading Flink configurations from Flink con files, but this approach is not user-friendly for multiple jobs that require different configurations. Allow passing Flink configuration by yaml job file
such as :

source:
  type: mysql
  ...

sink:
  type: value

pipeline:
  name: MySQL to Kafka Pipeline
  parallelism: 1
  flink-conf:
    execution.checkpointing.interval: 6min

MOBIN-F · 2025-02-14T05:50:08Z

@lvyanquan @yuxiqian PTAL~

MOBIN-F · 2025-02-14T05:56:03Z

flink-cdc-composer/src/main/java/org/apache/flink/cdc/composer/flink/FlinkPipelineComposer.java

+            org.apache.flink.configuration.Configuration flinkConfig) {
        return new FlinkPipelineComposer(
-                StreamExecutionEnvironment.getExecutionEnvironment(), true);
+                StreamExecutionEnvironment.getExecutionEnvironment(flinkConfig), true);


fix flink configuration fails in use-mini-cluster mode

joyCurry30 · 2025-03-26T06:40:03Z

Could you clarify the configuration priority hierarchy here?

lvyanquan · 2025-03-26T07:00:20Z

Could you clarify the configuration priority hierarchy here?

Please add this to document since we can add configuration through config.yaml/flink-cdc.sh/job.yaml.

MOBIN-F · 2025-03-31T10:52:53Z

docs/content.zh/docs/connectors/pipeline-connectors/oceanbase.md

  name: MySQL to OceanBase Pipeline
  parallelism: 1
+  flink:
+    execution.checkpointing.interval: 2min


I added flink parameters to the pipeline-connector demo, prompting users to configure flink parameters in pipeline.yaml

MOBIN-F · 2025-03-31T10:55:12Z

Please help review again @lvyanquan @joyCurry30

MOBIN-F · 2025-03-31T10:58:21Z

flink-cdc-cli/src/main/java/org/apache/flink/cdc/cli/CliFrontend.java

-                        + "or the environment variable \"FLINK_HOME\". "
-                        + "Please make sure Flink home is properly set. ");
-    }
-


Move the methods for loading and merging flink config to the FlinkEnvironmentUtils class

joyCurry30

Thanks @MOBIN-F ! I left some comments and questions.

joyCurry30 · 2025-04-01T01:43:11Z

docs/content.zh/docs/deployment/yarn.md

+通过Cli将作业提交至 Flink Yarn Application 集群。
 ```bash
 cd /path/flink-cdc-*
-./bin/flink-cdc.sh -t yarn-application -s hdfs:///flink/savepoint-1537 -Dexecution.checkpointing.interval=2min mysql-to-doris.yaml


I think it’s better to keep this example to show how to submit a job using the command line, and then show another YAML file for submitting jobs with Flink configurations.

We should recommend users to configure Flink parameters in YAML as much as possible, the command-line method is not intuitive and ugly,WDYT

+1 to keep CLI mode examples. Checkpoint path is not a static configuration and prone to change. Hard encoding it in YAML file is not better than dynamic CLI arguments.

joyCurry30 · 2025-04-01T01:47:56Z

flink-cdc-cli/src/main/java/org/apache/flink/cdc/cli/parser/YamlPipelineDefinitionParser.java

    private static final String TRANSFORM_KEY = "transform";
    private static final String PIPELINE_KEY = "pipeline";
    private static final String MODEL_KEY = "model";
+    private static final String FLINK_KEY = "flink";


I don’t think “flink” is a good choice as a key. Would it be better to use “configuration” or something else instead?What do you think? @lvyanquan

maybe flink-config can be used

IIRC only runtime execution configurations could be dynamically overridden. What about naming it execution-config?

flink-cdc-cli/src/main/java/org/apache/flink/cdc/cli/utils/FlinkEnvironmentUtils.java

MOBIN-F · 2025-04-23T01:49:39Z

Do we have plans to merge this PR in 3.4? I want to merge this PR in 3.4, WDYT?@lvyanquan

github-actions · 2025-08-22T00:04:46Z

This pull request has been automatically marked as stale because it has not had recent activity for 120 days. It will be closed in 60 days if no further activity occurs.

github-actions · 2025-10-22T00:04:35Z

This pull request has been closed because it has not had recent activity. You could reopen it if you try to continue your work, and anyone who are interested in it are encouraged to continue work on this pull request.

yuxiqian · 2025-10-29T09:58:51Z

@MOBIN-F Would you like to continue working on this PR?

MOBIN-F · 2025-10-29T10:06:10Z

@MOBIN-F Would you like to continue working on this PR?

yes,I will finish the remaining work next week~

MOBIN-F · 2025-12-11T07:14:36Z

@lvyanquan @yuxiqian please trigger CI，thanks~

yuxiqian

Thanks for @MOBIN-F's quick response!

Just left some general comments here, please take a look when available :)

yuxiqian · 2025-12-11T09:47:14Z

docs/content.zh/docs/core-concept/data-pipeline.md

 | local-time-zone        | 作业级别的本地时区。                                                                                                | optional          |
 | execution.runtime-mode | pipeline 的运行模式，包含 STREAMING 和 BATCH，默认值是 STREAMING。                                                       | optional          |
 | operator.uid.prefix    | Pipeline 中算子 UID 的前缀。如果不设置，Flink 会为每个算子生成唯一的 UID。 建议设置这个参数以提供稳定和可识别的算子 ID，这有助于有状态升级、问题排查和在 Flink UI 上的诊断。 | optional          |
+| flink-conf             | 用于配置[Flink相关参数](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/)。 <br/>Flink参数优先级：config.yaml < job command-line < pipeline.yaml | optional          |


Suggested change

| flink-conf | 用于配置[Flink相关参数](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/)。 <br/>Flink参数优先级：config.yaml < job command-line < pipeline.yaml | optional |

| flink-conf | 用于配置[Flink 参数](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/)。 <br/>Flink 参数生效优先级（由高到低）为 CDC CLI 命令行参数、Pipeline YAML 块、Flink `config.yaml`。 | optional |

Please specify if modern config.yaml or legacy flink-conf.yaml[1] is supported.

[1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#flink-configuration-file

yuxiqian · 2025-12-11T09:48:40Z

docs/content/docs/core-concept/data-pipeline.md

 | `schema-operator.rpc-timeout` | The timeout time for SchemaOperator to wait downstream SchemaChangeEvent applying finished, the default value is 3 minutes.                                                                                                                                                                                                                                                                                                                                                                               | optional          |
 | `operator.uid.prefix`         | The prefix to use for all pipeline operator UIDs. If not set, all pipeline operator UIDs will be generated by Flink. It is recommended to set this parameter to ensure stable and recognizable operator UIDs, which can help with stateful upgrades, troubleshooting, and Flink UI diagnostics.                                                                                                                                                                                                           | optional          |
-
+| flink-conf                    | Used to configure [Flink related parameters](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/). <br/>Flink parameter priority: config.yaml < job command-line < pipeline.yaml                                                                                                                                                                                                                                                                                                 | optional          |


Suggested change

| flink-conf | Used to configure [Flink related parameters](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/). <br/>Flink parameter priority: config.yaml < job command-line < pipeline.yaml | optional |

| flink-conf | Used to configure [Flink related parameters](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/). <br/> Flink configurations will take effect in the following order (high to low): CDC CLI arguments, YAML Pipeline config blocks, and Flink `config.yaml`. | optional |

yuxiqian · 2025-12-11T09:48:56Z

docs/content/docs/core-concept/data-pipeline.md

+| flink-conf                    | Used to configure [Flink related parameters](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/). <br/>Flink parameter priority: config.yaml < job command-line < pipeline.yaml                                                                                                                                                                                                                                                                                                 | optional          |
 NOTE: Whilst the above parameters are each individually optional, at least one of them must be specified. The `pipeline` section is mandatory and cannot be empty.

+


Revert accidental changes

yuxiqian · 2025-12-11T09:49:19Z

docs/content/docs/deployment/yarn.md

+   execution.target: yarn-session
+   yarn.application.id: {{YARN_APPLICATION_ID}}
+   execution.checkpointing.interval: 2min
+   #If you need to restore from a savepoint, configure the following parameters:


Suggested change

#If you need to restore from a savepoint, configure the following parameters:

# If you need to restore from a savepoint, uncomment the next line:

yuxiqian · 2025-12-11T09:50:24Z

docs/content.zh/docs/deployment/yarn.md

+   #如果需要从savepoint恢复，则配置以下参数
+   #execution.savepoint.path: hdfs:///flink/savepoint-1537


Suggested change

#如果需要从savepoint恢复，则配置以下参数

#execution.savepoint.path: hdfs:///flink/savepoint-1537

# 如需从 savepoint 恢复，可配置以下参数：

# execution.savepoint.path: hdfs:///flink/savepoint-1537

yuxiqian · 2025-12-11T09:53:47Z

docs/content.zh/docs/core-concept/data-pipeline.md

 | local-time-zone        | 作业级别的本地时区。                                                                                                | optional          |
 | execution.runtime-mode | pipeline 的运行模式，包含 STREAMING 和 BATCH，默认值是 STREAMING。                                                       | optional          |
 | operator.uid.prefix    | Pipeline 中算子 UID 的前缀。如果不设置，Flink 会为每个算子生成唯一的 UID。 建议设置这个参数以提供稳定和可识别的算子 ID，这有助于有状态升级、问题排查和在 Flink UI 上的诊断。 | optional          |
+| flink-conf             | 用于配置[Flink相关参数](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/)。 <br/>Flink参数优先级：config.yaml < job command-line < pipeline.yaml | optional          |


I don't think the YAML flink-conf block should have higher priority than CLI arguments. IIRC Flink SQL client and CLI arguments could override static config files, too.

A reasonable priority order might be CDC CLI arguments > YAML flink-conf block > Flink cluster config file.

yuxiqian · 2025-12-11T09:55:09Z

docs/content/docs/connectors/pipeline-connectors/starrocks.md

 pipeline:
   name: MySQL to StarRocks Pipeline
   parallelism: 2
+   flink-conf:


No need to modify documents of every connector, it's optional anyway. It has been described in the pipeline concept page.

yuxiqian · 2025-12-11T09:56:10Z

flink-cdc-cli/src/main/java/org/apache/flink/cdc/cli/CliFrontend.java

Please add an Pipeline E2e case to verify the priority of Flink config, YAML flink-conf block, and CDC CLI arguments.

github-actions bot added composer cli labels Feb 12, 2025

MOBIN-F closed this Feb 13, 2025

MOBIN-F reopened this Feb 14, 2025

MOBIN-F force-pushed the release-pipeline-yaml-flink-config branch from 42bf034 to 18f57f6 Compare February 14, 2025 03:37

github-actions bot added the mysql-pipeline-connector label Feb 14, 2025

MOBIN-F marked this pull request as ready for review February 14, 2025 03:43

MOBIN-F commented Feb 14, 2025

View reviewed changes

MOBIN-F force-pushed the release-pipeline-yaml-flink-config branch from ca81bb3 to 522ab43 Compare March 31, 2025 10:48

github-actions bot added docs Improvements or additions to documentation and removed mysql-pipeline-connector labels Mar 31, 2025

MOBIN-F commented Mar 31, 2025

View reviewed changes

joyCurry30 reviewed Apr 1, 2025

View reviewed changes

MOBIN-F force-pushed the release-pipeline-yaml-flink-config branch from 522ab43 to 168ef80 Compare April 15, 2025 03:18

github-actions bot added the build label Apr 15, 2025

MOBIN-F force-pushed the release-pipeline-yaml-flink-config branch from 8e0165a to 7e04477 Compare April 15, 2025 08:06

github-actions bot removed the build label Apr 15, 2025

MOBIN-F force-pushed the release-pipeline-yaml-flink-config branch from 7e04477 to 26f48c3 Compare April 23, 2025 01:47

github-actions bot added the Stale label Aug 22, 2025

github-actions bot closed this Oct 22, 2025

yuxiqian reopened this Oct 29, 2025

MOBIN-F force-pushed the release-pipeline-yaml-flink-config branch from 26f48c3 to 4160459 Compare November 11, 2025 06:45

yuxiqian removed the Stale label Nov 11, 2025

MOBIN-F added 3 commits December 11, 2025 13:48

Allow passing Flink configuration by yaml job file

7ab1947

add docs

9ab2644

fix local mode failure get flinkConf

3c85f01

MOBIN-F force-pushed the release-pipeline-yaml-flink-config branch from 4160459 to 3c85f01 Compare December 11, 2025 07:13

MOBIN-F mentioned this pull request Dec 11, 2025

[FLINK-37739][cli] fix flink-conf is not obtained in local mode #4006

Closed

yuxiqian reviewed Dec 11, 2025

View reviewed changes

	\| flink-conf \| 用于配置[Flink相关参数](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/)。 <br/>Flink参数优先级：config.yaml < job command-line < pipeline.yaml \| optional \|
	\| flink-conf \| 用于配置[Flink 参数](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/)。 <br/>Flink 参数生效优先级（由高到低）为 CDC CLI 命令行参数、Pipeline YAML 块、Flink `config.yaml`。 \| optional \|

	\| flink-conf \| Used to configure [Flink related parameters](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/). <br/>Flink parameter priority: config.yaml < job command-line < pipeline.yaml \| optional \|
	\| flink-conf \| Used to configure [Flink related parameters](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/). <br/> Flink configurations will take effect in the following order (high to low): CDC CLI arguments, YAML Pipeline config blocks, and Flink `config.yaml`. \| optional \|

		\| flink-conf \| Used to configure [Flink related parameters](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/). <br/>Flink parameter priority: config.yaml < job command-line < pipeline.yaml \| optional \|
		NOTE: Whilst the above parameters are each individually optional, at least one of them must be specified. The `pipeline` section is mandatory and cannot be empty.

	#If you need to restore from a savepoint, configure the following parameters:
	# If you need to restore from a savepoint, uncomment the next line:

		#如果需要从savepoint恢复，则配置以下参数
		#execution.savepoint.path: hdfs:///flink/savepoint-1537

[FLINK-36581][cli] Allow passing Flink configuration by yaml job file #3918

Are you sure you want to change the base?

[FLINK-36581][cli] Allow passing Flink configuration by yaml job file #3918

Uh oh!

Conversation

MOBIN-F commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MOBIN-F commented Feb 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joyCurry30 commented Mar 26, 2025

Uh oh!

lvyanquan commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MOBIN-F commented Mar 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joyCurry30 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MOBIN-F commented Apr 23, 2025

Uh oh!

github-actions bot commented Aug 22, 2025

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

yuxiqian commented Oct 29, 2025

Uh oh!

MOBIN-F commented Oct 29, 2025

Uh oh!

MOBIN-F commented Dec 11, 2025

Uh oh!

yuxiqian left a comment

Choose a reason for hiding this comment

Uh oh!

yuxiqian Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuxiqian Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

MOBIN-F commented Feb 12, 2025 •

edited

Loading

lvyanquan commented Mar 26, 2025 •

edited

Loading

yuxiqian Dec 11, 2025 •

edited

Loading

yuxiqian Dec 11, 2025 •

edited

Loading