Skip to content

sink(cloudstorage): add use-table-id-as-path option#4356

Open
yinshuangfei wants to merge 5 commits intopingcap:masterfrom
yinshuangfei:cdc_path_table_id
Open

sink(cloudstorage): add use-table-id-as-path option#4356
yinshuangfei wants to merge 5 commits intopingcap:masterfrom
yinshuangfei:cdc_path_table_id

Conversation

@yinshuangfei
Copy link

@yinshuangfei yinshuangfei commented Mar 5, 2026

What problem does this PR solve?

Issue Number: close #4357

What is changed and how it works?

  • Adds config use-table-id-as-path.

The use-table-id-as-path option switches the path to use table_id instead of table_name when it set to true.
With configuration use-table-id-as-path=true in sink uri, for example: --sink-uri="s3://cdc&use-table-id-as-path=true", the cdc path changed from

test_db/table_name/5/2024-01-01/CDC_xxx.json

to

12345/5/2024-01-01/CDC_xxx.json

In this mode, we adjust cloud storage path generation to omit schema when table-id-as-path is enabled and skip DB schema writes.

  • Adds use_table_id_as_path into v2 API conversion and sink config parsing.
  • Remove partition_id from the path to prevent duplicates.

Check List

Tests

  • Unit test

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

  • New Features

    • Added optional use-table-id-as-path configuration parameter for cloud storage sink, enabling file organization by numeric table IDs instead of table names.
  • Bug Fixes

    • Enhanced validation for exchange partition operations to properly detect and handle missing source table information.
  • Tests

    • Added test coverage for table ID-based path generation and exchange partition error handling.

Summary by CodeRabbit

  • New Features

    • Optional cloud storage sink setting to use numeric table IDs in file and directory paths (config/API/URI support).
  • Bug Fixes

    • Stronger validation for exchange-partition DDL events to reject missing/invalid source table info.
    • DDL and schema writes now include explicit table identifiers and skip database-level schema files when using table-ID-as-path to prevent incorrect writes.
  • Tests

    • Added tests for table-ID-based path generation, skipping DB schemas, and invalid exchange-partition DDL handling.

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Mar 5, 2026
@pingcap-cla-assistant
Copy link

pingcap-cla-assistant bot commented Mar 5, 2026

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 5, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the cloud storage sink by introducing a new configuration option, use-table-id-as-path. This option provides users with the flexibility to specify whether the output directory structure for table-related data should utilize the table's unique ID instead of its name, which can be beneficial for data management and consistency. The changes involve updating the sink's file writing logic and path generation to respect this new setting, along with corresponding test coverage. Additionally, a critical fix for DDL event processing ensures that exchange partition operations are handled correctly by validating the presence of source table information.

Highlights

  • New Configuration Option: Introduced use-table-id-as-path for the cloud storage sink, allowing output paths to use table IDs instead of table names for better organization and uniqueness.
  • Enhanced Path Generation Logic: Updated internal file path generation functions to conditionally use the table's unique ID based on the new configuration setting.
  • Improved DDL Event Handling: Added validation for ActionExchangeTablePartition DDL events to ensure that source table information is always present, preventing potential errors.
  • Expanded Test Coverage: Included new unit tests to verify the functionality of table ID-based paths and the robustness of the DDL event validation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • downstreamadapter/sink/cloudstorage/sink.go
    • Validated ActionExchangeTablePartition DDL events for missing source table info.
    • Modified writeFile function to accept tableID and updated its calls.
  • downstreamadapter/sink/cloudstorage/sink_test.go
    • Added test for use-table-id-as-path option in DDL event writing.
    • Included test for invalid ActionExchangeTablePartition DDL events.
  • pkg/sink/cloudstorage/config.go
    • Added UseTableIDAsPath field to urlConfig and Config structs.
    • Implemented parsing of use-table-id-as-path from sink URI.
  • pkg/sink/cloudstorage/config_test.go
    • Updated tests to include use-table-id-as-path in URI parsing and config application.
  • pkg/sink/cloudstorage/path.go
    • Introduced generateTablePath to conditionally use table name or ID.
    • Updated CheckOrWriteSchema and generateDataDirPath to use the new path generation logic.
  • pkg/sink/cloudstorage/path_test.go
    • Added test for data file path generation using table ID.
  • pkg/sink/cloudstorage/table_definition.go
    • Modified GenerateSchemaFilePath to accept useTableIDAsPath and tableID parameters.
    • Updated schema file path generation to use table ID when configured.
  • pkg/sink/cloudstorage/table_definition_test.go
    • Updated tests for GenerateSchemaFilePath to cover the useTableIDAsPath option.
Activity
  • Implemented a new configuration option for cloud storage sink path generation.
  • Refactored DDL event handling to include robust validation for exchange partition operations.
  • Added comprehensive unit tests to ensure the correctness of the new feature and the DDL event fix.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

📝 Walkthrough

Walkthrough

Adds a use-table-id-as-path sink option, propagates it through API/config, and updates path and schema generation to optionally use numeric table IDs. DDL writes now include explicit tableIDs, exchange-partition DDL validates source table info, function signatures updated, and tests added/adjusted.

Changes

Cohort / File(s) Summary
Configuration & API
pkg/sink/cloudstorage/config.go, pkg/sink/cloudstorage/config_test.go, api/v2/model.go, api/v2/model_test.go
Add UseTableIDAsPath flag in API and sink config, bind URL param use-table-id-as-path, merge/apply the value, and add tests for round-trip and URI parsing.
Path & Table-definition logic
pkg/sink/cloudstorage/path.go, pkg/sink/cloudstorage/path_test.go, pkg/sink/cloudstorage/table_definition.go, pkg/sink/cloudstorage/table_definition_test.go
Introduce generateTablePath and modify schema/data path generation to accept/use tableID and useTableIDAsPath; add validation/error returns and tests covering tableID-based paths and validation errors.
Sink implementation & tests
downstreamadapter/sink/cloudstorage/sink.go, downstreamadapter/sink/cloudstorage/sink_test.go
Require source table info for exchange-partition DDL, write target and source table defs with explicit tableIDs, extend writeFile signature to accept tableID, add logging and tests for table-id-as-path and invalid exchange events.
Config plumbing & CLI
pkg/config/sink.go, pkg/config/changefeed.go
Add use-table-id-as-path sink URI key and SinkConfig field, parse and propagate the param, and clear the field in rmStorageOnlyFields for non-storage downstreams.

Sequence Diagram(s)

(Skipped — changes are file/system-wide feature additions without a single new multi-component sequential flow that warrants a diagram.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

lgtm, approved

Suggested reviewers

  • wk989898
  • 3AceShowHand
  • hongyunyan

Poem

🐰 I hopped through configs, flags held tight,

Swapped names for numbers down the clouded lane,
I nudged schemas, kept the paths polite,
A tiny change, but tidy is the gain,
🥕 Cheers to clean files and orderly rain.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'sink(cloudstorage): add use-table-id-as-path option' is concise and clearly describes the primary change: adding a new configuration option to the cloud storage sink.
Description check ✅ Passed The PR description includes a closed issue reference (#4357), clear explanation of the change (use-table-id-as-path option), concrete before/after examples, implementation details, and mentions unit tests.
Linked Issues check ✅ Passed The code changes fully address the requirements in issue #4357: the use-table-id-as-path option is implemented across config parsing, sink config structures, and path generation logic with proper validation.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the use-table-id-as-path feature and supporting infrastructure, including config parsing, path generation, validation, API conversion, and associated unit tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a use-table-id-as-path option for the cloud storage sink, enabling paths to be constructed using table IDs instead of table names. While the implementation is well-tested and appears correct, a security audit identified that the use of log.Panic for handling invalid or unexpected input from external sources (DDL events or storage files) could lead to Denial of Service (DoS). It is recommended to replace panics with proper error handling to ensure the stability of the replication process. Additionally, there are minor refactoring opportunities to improve code reuse and test structure.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@downstreamadapter/sink/cloudstorage/sink_test.go`:
- Around line 289-293: The test is missing setup of the mock PD clock before
creating the sink; call appcontext.SetService to register the mock PD clock
service (same mock used in other tests) prior to invoking newSinkForTest so the
sink and any PD clock accessors see the mock; specifically, in the test where
ctx, cancel := context.WithCancel(context.Background()) and cloudStorageSink,
err := newSinkForTest(...) are created, insert the mock PD clock registration
(via appcontext.SetService(mockPDClock)) using the same mock instance used by
TestWriteDDLEvent/TestWriteDDLEventWithTableIDAsPath and ensure it is cleaned up
or reset after the test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ef1e6151-8c9d-4aad-8422-14f382fcb166

📥 Commits

Reviewing files that changed from the base of the PR and between 039417c and 74c09c3.

📒 Files selected for processing (8)
  • downstreamadapter/sink/cloudstorage/sink.go
  • downstreamadapter/sink/cloudstorage/sink_test.go
  • pkg/sink/cloudstorage/config.go
  • pkg/sink/cloudstorage/config_test.go
  • pkg/sink/cloudstorage/path.go
  • pkg/sink/cloudstorage/path_test.go
  • pkg/sink/cloudstorage/table_definition.go
  • pkg/sink/cloudstorage/table_definition_test.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sink/cloudstorage/table_definition.go`:
- Around line 334-343: The GenerateSchemaFilePath function allows
useTableIDAsPath to produce a "0" path when tableID is zero; add a guard at the
top of GenerateSchemaFilePath to validate tableID (e.g., tableID > 0) when
useTableIDAsPath is true and return a clear error if invalid, before calling
generateTablePath or computing checksum, so callers fail fast instead of writing
into an ambiguous "0" directory; reference: GenerateSchemaFilePath,
useTableIDAsPath, tableID, and generateTablePath.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 402c67b5-c910-49f1-b230-ddc3b6062f96

📥 Commits

Reviewing files that changed from the base of the PR and between 74c09c3 and 9326b6d.

📒 Files selected for processing (2)
  • downstreamadapter/sink/cloudstorage/sink_test.go
  • pkg/sink/cloudstorage/table_definition.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sink/cloudstorage/table_definition_test.go`:
- Around line 524-529: The test assertions in table_definition_test.go expect
the old error text "tableID must be positive" but GenerateSchemaFilePath now
returns "invalid table id for table-id path"; update the two assertions that
call GenerateSchemaFilePath (the cases with tableID 0 and -1) to assert that
err.Error() contains "invalid table id for table-id path" instead of the old
string so the test matches the current implementation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 213ab268-deb6-4709-a28f-0f9b8b979e5f

📥 Commits

Reviewing files that changed from the base of the PR and between 7296adb and 1e5ab99.

📒 Files selected for processing (2)
  • pkg/sink/cloudstorage/table_definition.go
  • pkg/sink/cloudstorage/table_definition_test.go

@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 5, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/sink/cloudstorage/table_definition_test.go (1)

522-529: Add one test that documents intended behavior when tableID is unset.

These assertions currently codify hard-fail on tableID == 0/-1. Please add a companion case that captures the intended production behavior for missing IDs (fallback vs fail-fast), so the contract stays explicit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sink/cloudstorage/table_definition_test.go` around lines 522 - 529, Add a
companion unit test for the missing/unset tableID case that calls
def.GenerateSchemaFilePath(false, 0) (i.e., use-table-id-as-path = false with
tableID == 0) and assert the function returns no error and a schema path that
uses the table name/default fallback rather than failing; this makes the
contract explicit alongside the existing failure assertions for
GenerateSchemaFilePath(true, 0) and GenerateSchemaFilePath(true, -1).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sink/cloudstorage/table_definition.go`:
- Around line 355-360: The check that hard-fails when tableID <= 0 inside the
useTableIDAsPath branch should be changed to gracefully fall back to using the
table name path instead of returning
errors.ErrInternalCheckFailed.GenWithStackByArgs; update the logic around the
useTableIDAsPath && t.Table branch (referencing useTableIDAsPath, t.Table,
tableID, and the current error return) so that if tableID is unset/<=0 you skip
the error and proceed to generate a path based on t.Table (or the existing
name-based path logic) rather than aborting.

---

Nitpick comments:
In `@pkg/sink/cloudstorage/table_definition_test.go`:
- Around line 522-529: Add a companion unit test for the missing/unset tableID
case that calls def.GenerateSchemaFilePath(false, 0) (i.e., use-table-id-as-path
= false with tableID == 0) and assert the function returns no error and a schema
path that uses the table name/default fallback rather than failing; this makes
the contract explicit alongside the existing failure assertions for
GenerateSchemaFilePath(true, 0) and GenerateSchemaFilePath(true, -1).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 77b845f5-35c1-4489-aa13-e29feadc64c4

📥 Commits

Reviewing files that changed from the base of the PR and between 1e5ab99 and 3302635.

📒 Files selected for processing (3)
  • pkg/sink/cloudstorage/path.go
  • pkg/sink/cloudstorage/table_definition.go
  • pkg/sink/cloudstorage/table_definition_test.go

@tenfyzhong
Copy link
Collaborator

/check-issue-triage-complete

@yinshuangfei yinshuangfei changed the title sink(cloudstorage): add use-table-id-as-path option DNM: sink(cloudstorage): add use-table-id-as-path option Mar 5, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Mar 6, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign flowbehappy for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yinshuangfei
Copy link
Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@yinshuangfei yinshuangfei changed the title DNM: sink(cloudstorage): add use-table-id-as-path option sink(cloudstorage): add use-table-id-as-path option Mar 6, 2026
@yinshuangfei
Copy link
Author

@coderabbitai resume

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/sink/cloudstorage/path.go (1)

398-418: ⚠️ Potential issue | 🟠 Major

Avoid duplicating the physical table ID in partition paths.

With both UseTableIDAsPath and EnablePartitionSeparator enabled, Line 400-403 and Line 417 both append tbl.TableNameWithPhysicTableID.TableID. For partitioned tables this generates <physicalTableID>/<version>/<physicalTableID>/..., so the extra segment stops carrying distinct partition information and changes the path layout unexpectedly.

💡 One safe direction if physical table ID is already the top-level key
-	if f.config.EnablePartitionSeparator && tbl.TableNameWithPhysicTableID.IsPartition {
+	if f.config.EnablePartitionSeparator &&
+		tbl.TableNameWithPhysicTableID.IsPartition &&
+		!f.config.UseTableIDAsPath {
 		elems = append(elems, fmt.Sprintf("%d", tbl.TableNameWithPhysicTableID.TableID))
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sink/cloudstorage/path.go` around lines 398 - 418, The code appends the
physical TableID twice when both f.config.UseTableIDAsPath and
f.config.EnablePartitionSeparator are true; update the partition-separator block
so it only appends the extra TableID when it is not already included by
generateTablePath. Concretely, in the branch that currently appends
fmt.Sprintf("%d", tbl.TableNameWithPhysicTableID.TableID) under
f.config.EnablePartitionSeparator, add a guard to skip this append when
f.config.UseTableIDAsPath is true (or when generateTablePath was called with the
TableID-included flag), referencing f.config.UseTableIDAsPath,
f.config.EnablePartitionSeparator, tbl.TableNameWithPhysicTableID.TableID,
generateTablePath, and f.versionMap[tbl].
🧹 Nitpick comments (1)
pkg/sink/cloudstorage/path.go (1)

218-232: Remove the second invalid-table-ID check.

Line 218 already routes through def.GenerateSchemaFilePath(...), which is the canonical validation point for use-table-id-as-path. The extra branch at Line 230-232 is dead on this path and duplicates the error literal, so the two sites can drift again.

♻️ Suggested cleanup
 	tblSchemaFile, err := def.GenerateSchemaFilePath(f.config.UseTableIDAsPath, table.TableNameWithPhysicTableID.TableID)
 	if err != nil {
 		return false, err
 	}
 	exist, err := f.storage.FileExists(ctx, tblSchemaFile)
 	if err != nil {
 		return false, err
 	}
 	if exist {
 		f.versionMap[table] = table.TableInfoVersion
 		return false, nil
 	}
-	if f.config.UseTableIDAsPath && table.TableNameWithPhysicTableID.TableID <= 0 {
-		return false, errors.ErrInternalCheckFailed.GenWithStackByArgs("invalid table id for table-id path")
-	}

Based on learnings: GenerateSchemaFilePath(useTableIDAsPath bool, tableID int64) intentionally returns ErrInternalCheckFailed when tableID <= 0 and useTableIDAsPath=true.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sink/cloudstorage/path.go` around lines 218 - 232, The code duplicates
validation for table IDs: GenerateSchemaFilePath(f.config.UseTableIDAsPath,
table.TableNameWithPhysicTableID.TableID) already returns ErrInternalCheckFailed
when UseTableIDAsPath is true and tableID <= 0, so remove the redundant
conditional that checks f.config.UseTableIDAsPath &&
table.TableNameWithPhysicTableID.TableID <= 0 (the branch after exist check that
returns ErrInternalCheckFailed) to avoid dead/duplicated validation; keep the
call to def.GenerateSchemaFilePath, the FileExists check (f.storage.FileExists),
and the versionMap assignment (f.versionMap[table] = table.TableInfoVersion)
untouched.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@downstreamadapter/sink/cloudstorage/sink.go`:
- Around line 227-234: The early-return in sink.writeFile incorrectly uses
!def.IsTableSchema(), causing invalid table-level DDLs to be silently skipped in
UseTableIDAsPath mode; change the condition to explicitly check for a true
database-level DDL by using def.Table == "" (and keep the surrounding
cfg.UseTableIDAsPath check) so only genuine schema-level events are skipped and
invalid table defs will proceed to GenerateSchemaFilePath and surface validation
errors; reference symbols: sink.writeFile, cfg.UseTableIDAsPath, def.Table,
def.IsTableSchema(), GenerateSchemaFilePath.

In `@pkg/config/sink.go`:
- Around line 954-965: The compatibility check must include the new
UseTableIDAsPath flag so sink-URI-only changes are detected: update
CheckCompatibilityWithSinkURI to parse UseTableIDAsPath from params (as done in
the shown block) and compare it against the current s.UseTableIDAsPath value; if
they differ, populate cfgInSinkURI and cfgInFile for UseTableIDAsPath (similar
to Protocol and TxnAtomicity) and do not let the function return early ignoring
this difference. Ensure you reference the same symbols (UseTableIDAsPath,
CheckCompatibilityWithSinkURI, cfgInSinkURI, cfgInFile, params.Get) when adding
this comparison so URI-driven toggles are correctly treated as
compatibility-breaking.

---

Outside diff comments:
In `@pkg/sink/cloudstorage/path.go`:
- Around line 398-418: The code appends the physical TableID twice when both
f.config.UseTableIDAsPath and f.config.EnablePartitionSeparator are true; update
the partition-separator block so it only appends the extra TableID when it is
not already included by generateTablePath. Concretely, in the branch that
currently appends fmt.Sprintf("%d", tbl.TableNameWithPhysicTableID.TableID)
under f.config.EnablePartitionSeparator, add a guard to skip this append when
f.config.UseTableIDAsPath is true (or when generateTablePath was called with the
TableID-included flag), referencing f.config.UseTableIDAsPath,
f.config.EnablePartitionSeparator, tbl.TableNameWithPhysicTableID.TableID,
generateTablePath, and f.versionMap[tbl].

---

Nitpick comments:
In `@pkg/sink/cloudstorage/path.go`:
- Around line 218-232: The code duplicates validation for table IDs:
GenerateSchemaFilePath(f.config.UseTableIDAsPath,
table.TableNameWithPhysicTableID.TableID) already returns ErrInternalCheckFailed
when UseTableIDAsPath is true and tableID <= 0, so remove the redundant
conditional that checks f.config.UseTableIDAsPath &&
table.TableNameWithPhysicTableID.TableID <= 0 (the branch after exist check that
returns ErrInternalCheckFailed) to avoid dead/duplicated validation; keep the
call to def.GenerateSchemaFilePath, the FileExists check (f.storage.FileExists),
and the versionMap assignment (f.versionMap[table] = table.TableInfoVersion)
untouched.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a51876cc-08fb-4742-acf1-cc189eb723ed

📥 Commits

Reviewing files that changed from the base of the PR and between 3302635 and 2cc7aa9.

📒 Files selected for processing (11)
  • api/v2/model.go
  • api/v2/model_test.go
  • downstreamadapter/sink/cloudstorage/sink.go
  • downstreamadapter/sink/cloudstorage/sink_test.go
  • pkg/config/changefeed.go
  • pkg/config/sink.go
  • pkg/sink/cloudstorage/config.go
  • pkg/sink/cloudstorage/path.go
  • pkg/sink/cloudstorage/path_test.go
  • pkg/sink/cloudstorage/table_definition.go
  • pkg/sink/cloudstorage/table_definition_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/sink/cloudstorage/path_test.go

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

✅ Actions performed

Reviews resumed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/config/sink.go (1)

954-965: ⚠️ Potential issue | 🟠 Major

Add UseTableIDAsPath to the compatibility delta check.

This block makes use-table-id-as-path URI-governed, but CheckCompatibilityWithSinkURI still only keys off Protocol and TxnAtomicity. A changefeed update that only toggles UseTableIDAsPath can therefore return early and miss a real URI/config conflict.

🐛 Proposed fix
-	cfgParamsChanged := s.Protocol != oldSinkConfig.Protocol ||
-		s.TxnAtomicity != oldSinkConfig.TxnAtomicity
+	cfgParamsChanged := s.Protocol != oldSinkConfig.Protocol ||
+		s.TxnAtomicity != oldSinkConfig.TxnAtomicity ||
+		s.UseTableIDAsPath != oldSinkConfig.UseTableIDAsPath

Also applies to: 997-1014

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/config/sink.go` around lines 954 - 965, The change makes UseTableIDAsPath
(s.UseTableIDAsPath) driven from the sink URI but CheckCompatibilityWithSinkURI
still ignores that key, so updates toggling UseTableIDAsPath can bypass
compatibility checks; update CheckCompatibilityWithSinkURI to include
UseTableIDAsPath in the compatibility delta comparison (same style as
Protocol/TxnAtomicity), using the same config key string (UseTableIDAsPathKey)
and comparing util.GetOrZero(s.UseTableIDAsPath) with the parsed/enabled value
from cfgInSinkURI/cfgInFile, and ensure cfgInSinkURI and cfgInFile are populated
consistently where UseTableIDAsPath is parsed so the function can detect and
return conflicts correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sink/cloudstorage/path.go`:
- Around line 127-131: The generateTablePath function currently returns any
tableID as a string which allows useTableIDAsPath=true to produce "0/..." paths;
change the logic to reject non-positive IDs the same way GenerateSchemaFilePath
does: when useTableIDAsPath is true and tableID <= 0 return an error
(ErrInternalCheckFailed) instead of silently returning "0", and update callers
that build data/index directories (the code paths referenced around the
data/index generation using generateTablePath) to handle/propagate that error so
data/index path construction fails early rather than creating mixed ID/name
paths.

---

Duplicate comments:
In `@pkg/config/sink.go`:
- Around line 954-965: The change makes UseTableIDAsPath (s.UseTableIDAsPath)
driven from the sink URI but CheckCompatibilityWithSinkURI still ignores that
key, so updates toggling UseTableIDAsPath can bypass compatibility checks;
update CheckCompatibilityWithSinkURI to include UseTableIDAsPath in the
compatibility delta comparison (same style as Protocol/TxnAtomicity), using the
same config key string (UseTableIDAsPathKey) and comparing
util.GetOrZero(s.UseTableIDAsPath) with the parsed/enabled value from
cfgInSinkURI/cfgInFile, and ensure cfgInSinkURI and cfgInFile are populated
consistently where UseTableIDAsPath is parsed so the function can detect and
return conflicts correctly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b4a1808c-9d41-4b3a-904b-92ae604bef64

📥 Commits

Reviewing files that changed from the base of the PR and between 3302635 and 2cc7aa9.

📒 Files selected for processing (11)
  • api/v2/model.go
  • api/v2/model_test.go
  • downstreamadapter/sink/cloudstorage/sink.go
  • downstreamadapter/sink/cloudstorage/sink_test.go
  • pkg/config/changefeed.go
  • pkg/config/sink.go
  • pkg/sink/cloudstorage/config.go
  • pkg/sink/cloudstorage/path.go
  • pkg/sink/cloudstorage/path_test.go
  • pkg/sink/cloudstorage/table_definition.go
  • pkg/sink/cloudstorage/table_definition_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

use table_id instead of table_name as cdc path.

2 participants