Skip to content

Conversation

@tchivs
Copy link
Contributor

@tchivs tchivs commented Dec 29, 2025

What is the purpose of the pull request

This PR adds metadata column support for the PostgreSQL Pipeline Connector, enabling users to access metadata information such as operation timestamp, database name, schema name, and table name in their data pipelines.

Brief change log

  • Add 4 metadata column implementations:
    • OpTsMetadataColumn: Operation timestamp metadata
    • DatabaseNameMetadataColumn: Database name metadata
    • SchemaNameMetadataColumn: Schema name metadata
    • TableNameMetadataColumn: Table name metadata
  • Update PostgresDataSource to support metadata columns via supportedMetadataColumns() method
  • Add comprehensive E2E test testAllMetadataColumns() in PostgresFullTypesITCase
  • Update documentation for both English and Chinese versions

Verifying this change

This change added tests and can be verified as follows:

  • Added testAllMetadataColumns() E2E test in PostgresFullTypesITCase
  • Test verifies metadata columns in both snapshot and incremental phases
  • All tests pass successfully

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths: no
  • Anything that affects deployment or recovery: no
  • Does this pull request introduce a new feature: yes

Documentation

  • Does this pull request introduce a new feature: yes
  • If yes, how is the feature documented: docs

@tchivs tchivs changed the title [FLINK-XXXXX][pipeline-connector/postgres] Add metadata column support for PostgreSQL Pipeline Connector [FLINK-38844][pipeline-connector/postgres] Add metadata column support for PostgreSQL Pipeline Connector Dec 29, 2025
@tchivs tchivs changed the title [FLINK-38844][pipeline-connector/postgres] Add metadata column support for PostgreSQL Pipeline Connector [FLINK-38844][pipeline-connector-postgres] Add metadata column support for PostgreSQL Pipeline Connector Dec 29, 2025
@tchivs tchivs changed the title [FLINK-38844][pipeline-connector-postgres] Add metadata column support for PostgreSQL Pipeline Connector [FLINK-38844][pipeline-connector][postgres]Add metadata column support for PostgreSQL Pipeline Connector Dec 29, 2025
This commit adds metadata column support for the PostgreSQL Pipeline Connector,
enabling users to access metadata information in their data pipelines.

Changes:
- Add OpTsMetadataColumn for operation timestamp
- Add DatabaseNameMetadataColumn for database name
- Add SchemaNameMetadataColumn for schema name
- Add TableNameMetadataColumn for table name
- Update PostgresDataSource to support metadata columns
- Add comprehensive E2E test testAllMetadataColumns()
- Update documentation (English and Chinese)
@tchivs tchivs changed the title [FLINK-38844][pipeline-connector][postgres]Add metadata column support for PostgreSQL Pipeline Connector [FLINK-38844][pipeline-connector][postgres]Add metadata column support Dec 29, 2025
@github-actions github-actions bot added docs Improvements or additions to documentation postgres-pipeline-connector labels Dec 29, 2025
Copy link
Member

@yuxiqian yuxiqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for @tchivs' contribution.

I wonder if we need individual metadata columns for database, schema, and table, since they're always available in Transform expressions (only after FLINK-38840 got closed).

@tchivs
Copy link
Contributor Author

tchivs commented Dec 29, 2025

Thanks for @tchivs' contribution.

I wonder if we need individual metadata columns for database, schema, and table, since they're always available in Transform expressions (only after FLINK-38840 got closed).

Thanks for the review @yuxiqian! You raise an important point about the overlap with Transform metadata fields.

You're right that namespace_name, schema_name, and table_name are already available in Transform expressions. Let me clarify the design rationale:

  1. op_ts is essential and non-redundant:
  • There's no equivalent in Transform metadata fields (MetadataColumns.java only defines namespace_name, schema_name, table_name, data_event_type)
  • op_ts can only be obtained via metadata.list from the source connector
  • This is consistent with MySQL connector's implementation
  1. For database_name, schema_name, table_name - there is overlap:

I see two perspectives here:

Argument for keeping them:

  • Sink persistence: Users can pass these to downstream sinks without writing transform rules
  • Consistency: MySQL connector has op_ts via metadata.list, so having all metadata follow the same pattern is intuitive
  • Simplicity: Direct configuration is simpler than transform expressions for basic use cases

Argument for removing them:

  • Redundancy: Transform already provides namespace_name, schema_name, table_name
  • Maintenance: Less code to maintain if we rely on Transform metadata

My suggestion:

  • Keep op_ts (essential, no alternative)
  • For database_name/schema_name/table_name, I'm open to either approach:
    • Option A: Keep them for consistency and ease of use
    • Option B: Remove them and document the Transform approach in the docs

What's your preference? I'm happy to adjust the PR based on the team's direction.

@yuxiqian
Copy link
Member

I think it's OK to polish documentations in this PR, leaving metadata definitions as it is.

@tchivs
Copy link
Contributor Author

tchivs commented Dec 29, 2025

I think it's OK to polish documentations in this PR, leaving metadata definitions as it is.

Thanks @yuxiqian for the feedback! I've polished the documentation to clarify the relationship between metadata columns and Transform expressions.

Changes made:

  • Added a note section explaining that database_name, schema_name, and table_name are also available via Transform expressions (__namespace_name__, __schema_name__, __table_name__)
  • Clarified that op_ts is only available via metadata.list
  • Explained the trade-offs: using metadata.list allows passing values directly to downstream sinks without transform rules (simpler for basic use cases)
  • Updated the table descriptions to mention the Transform expression alternatives

The metadata definitions remain unchanged as you suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation postgres-pipeline-connector

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants