Skip to content

[Feature][Zeta]Add dynamic metadata provider#10838

Draft
chl-wxp wants to merge 5 commits intoapache:devfrom
chl-wxp:add-DynamicMetadataProvider
Draft

[Feature][Zeta]Add dynamic metadata provider#10838
chl-wxp wants to merge 5 commits intoapache:devfrom
chl-wxp:add-DynamicMetadataProvider

Conversation

@chl-wxp
Copy link
Copy Markdown
Member

@chl-wxp chl-wxp commented Apr 30, 2026

Purpose of this pull request

Closes #10816.

Add a built-in dynamic MetadataProvider for Zeta runtime, so datasource connection properties can be registered through REST API and referenced in job configs by metadata_datasource_id.

What is changed

  • Add DynamicMetadataProvider for resolving datasource properties from Zeta runtime storage.
  • Store registered datasource metadata in Hazelcast distributed IMap.
  • Add REST APIs for datasource metadata management:
    • POST /metadata/datasource
    • GET /metadata/datasource/{metadataDatasourceId}
    • GET /metadata/datasources
    • PUT /metadata/datasource/{metadataDatasourceId}
    • DELETE /metadata/datasource/{metadataDatasourceId}
  • Support using metadata_datasource_id in job config to avoid repeating datasource connection properties.
  • Add serialization support for dynamic metadata datasource objects.

Example

Register datasource once:

{
  "metadataDatasourceId": "mysql_001",
  "connectorType": "Jdbc",
  "properties": {
    "url": "jdbc:mysql://127.0.0.1:3306/test",
    "user": "root",
    "password": "******",
    "driver": "com.mysql.cj.jdbc.Driver"
  }
}

Then reference it in job config:

  source {
    Jdbc {
      metadata_datasource_id = "mysql_001"
      query = "select * from user"
    }
  }

Does this PR introduce any user-facing change?

Yes. Users can register datasource metadata through Zeta REST API and reference it in job configs by metadata_datasource_id.

How was this patch tested?

  • Manual test with Zeta runtime REST API datasource registration.
  • Manual test job config using metadata_datasource_id for Jdbc source/sink.

@chl-wxp chl-wxp self-assigned this Apr 30, 2026
@chl-wxp chl-wxp changed the title [Feature][]Add dynamic metadata provider [Feature][Zeta]Add dynamic metadata provider Apr 30, 2026
Copy link
Copy Markdown

@DanielLeens DanielLeens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. I reviewed this PR from the full diff and traced the real runtime path from job parsing to the new REST layer.

What problem this PR is trying to solve

  • User pain: contributors want datasource connection properties to be managed once on the engine side instead of being repeated in every job config.
  • Fix approach: this PR adds a Hazelcast-backed DynamicMetadataProvider, exposes REST CRUD endpoints for datasources, and injects datasource properties into connector configs through metadata_datasource_id.
  • One-line summary: this is trying to turn datasource configuration into a dynamically managed engine-side registry.

Example:

  • Before this PR, a job still needed to inline its JDBC/Kafka/etc. datasource properties in the job config.
  • After this PR, the user can register a datasource once and let the parser inject it dynamically when metadata_datasource_id is present.

Core logic review

Exact change

The main path is:

  • MultipleTableJobConfigParser.java:857-867 decides whether datasource resolution should happen.
  • MetadataProviderManager.java:180-223 asks the provider for a datasource map.
  • MetadataProviderManager.java:298-355 merges that map back into the connector config.
  • DynamicMetadataProvider.java:109-156 loads datasource properties from a Hazelcast IMap.
  • JettyService.java:220-257 exposes /metadata/datasource and /metadata/datasources.
  • DynamicMetadataDataSourceService.java:56-138 creates/reads/lists/updates/deletes the datasource objects.

The main correctness problem is that the new serialized datasource object stores every property value as a string:

for (Map.Entry<String, Object> entry : properties.entrySet()) {
    out.writeString(entry.getKey());
    out.writeString(entry.getValue().toString());
}

String value = in.readString();
properties.put(key, value);

And the REST layer returns the same properties directly:

datasource.getProperties().forEach(
        (key, value) -> {
            propertiesJson.add(key, value != null ? value.toString() : "");
        });

Key findings

  1. The normal metadata-enabled job parsing path definitely hits this PR. This is not dormant code.
  2. The new REST endpoints do not just manage harmless metadata. They can store and return real connector properties, which naturally includes passwords, tokens, and access keys.
  3. DynamicMetadataDataSource.java:79-103 currently flattens all values to String, so typed connector options lose their original types on the main path.
  4. This PR changes both seatunnel-engine and seatunnel-api, but the PR diff contains no docs/ files and no src/test/java files.
  5. The current CI failure is PR-caused, not infrastructure noise.

Full execution path

Job parsing main path
  -> MultipleTableJobConfigParser.handleDataSource() [857-867]
      -> metadata.enabled=true and metadata_datasource_id exists
      -> MetadataProviderManager.resolveDataSourceConfigs(...)

Dynamic datasource resolution
  -> MetadataProviderManager.resolveConnectorConfig() [180-223]
      -> provider.datasourceMap(connectorIdentifier, datasourceId)
  -> DynamicMetadataProvider.datasourceMap() [109-156]
      -> datasourceIMap.get(metaDataDatasourceId)
      -> validate connectorType
      -> return datasource.getProperties()
  -> MetadataProviderManager.mergeConfig() [298-355]
      -> merge datasource values back into connector config
      -> ConfigFactory.parseMap(mergedMap)

REST registration / lookup path
  -> JettyService.createJettyServer() [168-174]
      -> BasicAuthFilter is only enabled when the user explicitly turns it on
  -> JettyService [220-257]
      -> expose /metadata/datasource and /metadata/datasources
  -> DynamicMetadataDataSourceServlet.doGet/doPost/doPut/doDelete() [44-112]
      -> call DynamicMetadataDataSourceService
  -> DynamicMetadataDataSourceService.create/get/list(...) [56-138]
      -> store connectorType + properties in Hazelcast IMap
  -> DynamicMetadataDataSource.writeData/readData() [79-103]
      -> coerce every property value to String during serialization

Compatibility impact

Compatibility judgment: fully compatible for existing jobs, because old jobs do not hit this path unless they explicitly enable dynamic metadata.

  • API: additive.
  • Configs: additive.
  • Defaults: no old default is changed.
  • Protocol / serialization: a new Hazelcast-serialized type is introduced.
  • Historical behavior: existing jobs remain unchanged unless they opt into the new feature.

The current blockers are therefore not about old-job compatibility. They are about the safety and correctness of the new feature itself.

Performance / side effects

  • The extra IMap lookup and config merge are acceptable from a performance perspective.
  • The more important side effects are security and config correctness, not CPU cost.
  • Datasource configs will also live in the cluster map and be replicated across the Hazelcast cluster.

Error handling and logs

I did not find a major swallowed-exception bug, but I found three formal issues.

Problem 1: the new REST API exposes datasource secrets on the default HTTP management surface, and returned values are not redacted

  • Location: seatunnel-engine/seatunnel-engine-server/src/main/java/org/apache/seatunnel/engine/server/JettyService.java:168
  • Why this is a problem: JettyService.java:220-257 exposes /metadata/datasource and /metadata/datasources, DynamicMetadataDataSourceServlet.java:44-112 provides GET/POST/PUT/DELETE, and DynamicMetadataDataSourceService.java:103-138,226-241 returns the full properties map. The only protection in this PR is the optional global basic auth filter at JettyService.java:168-174. In the default HTTP management deployment, this effectively adds a plaintext secret read/write API.
  • Risk: this is a high-severity security issue. Anyone who can reach the engine HTTP port may be able to read or overwrite real datasource credentials.
  • Best fix: recommended option A is to put these endpoints behind mandatory authentication and redact sensitive fields in GET/LIST responses. Option B, if that authentication boundary is not ready yet, is to avoid exposing plaintext read endpoints in this PR.
  • Severity: High

Problem 2: the Hazelcast serialization path coerces every datasource property to String, which breaks the connector type contract on the main path

  • Location: seatunnel-engine/seatunnel-engine-core/src/main/java/org/apache/seatunnel/engine/core/job/DynamicMetadataDataSource.java:79
  • Why this is a problem: DynamicMetadataDataSource.writeData()/readData() at 79-103 serializes all values as strings, and updateProperties() even narrows updates to Map<String, String>. But the main runtime path later feeds this map back into MetadataProviderManager.mergeConfig() (MetadataProviderManager.java:298-355). So integers, booleans, lists, nested objects, and other typed connector values are all at risk of being silently turned into strings before the connector sees them.
  • Risk: this is a main-path correctness issue. It can cause validation failures or, worse, incorrect runtime behavior with the wrong config types.
  • Best fix: recommended option A is to preserve real value types in the serialization protocol. Option B, if the feature is intentionally string-only for now, is to make that contract explicit in validation and docs instead of exposing Map<String, Object>.
  • Severity: High

Problem 3: this is a new user-visible engine/API capability, but the PR ships without docs or regression tests, and the current Build is already red due to an example resource issue

  • Location: seatunnel-examples/seatunnel-engine-examples/src/main/resources/examples/fake_to_console.conf:1
  • Why this is a problem: gh pr diff 10838 --repo apache/seatunnel --name-only shows 16 changed files and no docs/ or src/test/java paths. In addition, gh pr checks 10838 --repo apache/seatunnel shows Build failing, and the failed log points to the example file above missing a valid ASF license header.
  • Risk: for a new seatunnel-engine / seatunnel-api feature, this means the usage contract is undocumented, the main path is untested, and the PR does not even pass the basic static quality gate yet.
  • Best fix: fix the license-header failure first, add at least one regression test for the register -> resolve -> inject main path, and add both English and Chinese docs describing the feature boundary, auth expectations, and supported property types.
  • Severity: Medium

Code quality

Most of the Java code follows the usual project style, but the missing license header in the new example is already enough to keep CI red.

Test coverage

Local verification I actually ran:

  • gh pr diff 10838 --repo apache/seatunnel --name-only: passed, confirmed the PR changes 16 files and includes no docs or test files.
  • git show seatunnel-review-10838:.../MultipleTableJobConfigParser.java | nl -ba | sed -n '857,867p': passed, confirmed the main job parser path now resolves dynamic datasources.
  • git show seatunnel-review-10838:.../DynamicMetadataProvider.java | nl -ba | sed -n '109,156p': passed, confirmed the provider reads datasource configs from Hazelcast IMap.
  • git show seatunnel-review-10838:.../DynamicMetadataDataSource.java | nl -ba | sed -n '79,103p': passed, confirmed all property values are currently serialized as strings.
  • gh pr checks 10838 --repo apache/seatunnel: failed, current Build is red.
  • gh run view 25092004758 --repo chl-wxp/seatunnel-laowang --log-failed: passed, confirmed the failure is Run / License header, and the invalid file is seatunnel-examples/seatunnel-engine-examples/src/main/resources/examples/fake_to_console.conf.

For this scope, the test/documentation gap is too large right now.

Documentation

This is a user-visible feature, but the PR contains no docs/en or docs/zh update at all. At minimum, it needs documentation for:

  • how to enable the dynamic metadata provider;
  • how the REST datasource API should be used;
  • what the authentication expectation is;
  • what property types are officially supported;
  • how this feature relates to other metadata/schema flows.

Architecture assessment

  • Elegance: the product direction is interesting and potentially valuable.
  • Maintainability: the current security boundary and type contract are not explicit enough.
  • Extensibility: if the two high-severity issues are fixed, this could evolve into a reusable datasource registry.
  • Historical compatibility: existing jobs are safe by default, but the new feature is not stable enough yet.

Issue summary

No. Issue Location Severity
1 The new REST API exposes datasource secrets on the default HTTP management surface and returns them unredacted JettyService.java:168 High
2 The serialization path coerces every datasource property to String and breaks typed connector configs DynamicMetadataDataSource.java:79 High
3 The feature ships without docs/tests, and the current Build is already red because of an example resource issue fake_to_console.conf:1 Medium

Merge conclusion

Conclusion: merge after fixes

  1. Blocking items
  • Problem 1: define a real security boundary for secrets before exposing this datasource registry on the engine HTTP surface.
  • Problem 2: preserve property types correctly instead of flattening everything to strings.
  • Problem 3: get the current Build back to green and add the minimum documentation / regression coverage for a new engine/API feature.
  1. Suggested follow-up
  • No extra non-blocking item beyond the blockers above. The current priority is to make the feature safe and correctly specified.

Overall assessment:

I like the direction and I can see the motivation clearly, but at the latest head this is not merge-ready yet. My preference would be:

  • Option A, recommended: design this as a protected config-reference service with mandatory auth/redaction rules and a type-safe serialization contract.
  • Option B: if a smaller first version is needed, narrow the feature boundary aggressively instead of exposing a general-purpose secret-bearing datasource registry immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][Zeta]Add built-in runtime dynamic MetadataProvider for datasource registration

2 participants