Skip to content

Improve generated sharding index names to reduce name inflation#38449

Open
ym0506 wants to merge 8 commits intoapache:masterfrom
ym0506:issue_37123
Open

Improve generated sharding index names to reduce name inflation#38449
ym0506 wants to merge 8 commits intoapache:masterfrom
ym0506:issue_37123

Conversation

@ym0506
Copy link
Copy Markdown
Contributor

@ym0506 ym0506 commented Mar 16, 2026

Summary

This change reduces generated sharding index name inflation while keeping reverse mapping backward compatible.

  • generate actual index names with a deterministic shortened suffix instead of appending the full actual table name
  • keep logical index name recovery compatible with both the new shortened format and the legacy _<actualTableName> format
  • update sharding metadata revision, MySQL show index/show create table merge, and pipeline DDL decoration to use the new recovery path
  • preserve legacy constraint handling in MySQL show create table
  • add tests for shortened-name generation, legacy compatibility, and MySQL merge behavior

Fixes #37123

Tests

  • ./mvnw -pl infra/common,features/sharding/core,features/sharding/dialect/mysql -am -DskipITs -Dsurefire.failIfNoSpecifiedTests=false -Dtest=IndexMetaDataUtilsTest,IndexTokenTest,ShardingIndexReviserTest,MySQLShardingShowIndexMergedResultTest,MySQLShardingShowCreateTableMergedResultTest test
  • ./mvnw -pl kernel/data-pipeline/core -am -DskipITs -DskipTests test-compile

Copy link
Copy Markdown
Member

@terrymanu terrymanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decision

Merge Verdict: Not Mergeable
Reviewed Scope: latest PR head 3345885 dated March 16, 2026 for PR #38449, covering infra/common, features/sharding/core, features/sharding/dialect/mysql, kernel/data-pipeline/core, test/it/rewriter, plus issue #37123 and PR CI.
Not Reviewed Scope: live PostgreSQL/openGauss cluster execution, full logs for all 91 CI jobs, and untouched modules outside the 11 changed files.
Need Expert Review: PostgreSQL/openGauss DDL rewrite and upgrade-compatibility review after the next revision.
Positive Feedback

The direction is correct. The backward-compatibility part is fairly complete: ShardingIndexReviser.java (line 43), MySQLShardingShowIndexMergedResult.java (line 55), MySQLShardingShowCreateTableMergedResult.java (line 84), and PipelineDDLDecorator.java (line 117) can all recognize both the new and legacy physical index name formats.
The PR currently also has 91 GitHub check runs, all completed/success.
Major Issues

[P1] The root cause is not actually fixed.
In IndexMetaDataUtils.java (line 70), the PR only shortens the suffix into a hash, but it still unconditionally appends a suffix to the logical index name. There is no truncation or enforcement based on dialect-specific identifier limits.
At the same time, CreateIndexStatement.java (line 133) can still generate very long logical names for anonymous indexes, and IndexMetaDataUtilsTest.java (line 70) only verifies “shorter than the old format”, not “within the database limit”.
This does not satisfy the coverage expectation in CODE_OF_CONDUCT.md (line 20) and misses the boundary-testing requirement in CODE_OF_CONDUCT.md (line 90). For PostgreSQL/openGauss-style 63-byte identifiers, long logical index names can still fail with index name too long.
Newly Introduced Issues

[P1] This PR breaks forward DDL compatibility for existing physical indexes.
In IndexToken.java (line 72), forward routing now only generates the new shortened physical name. In index.xml (line 62), the PostgreSQL/openGauss expectations for ALTER INDEX RENAME and DROP INDEX were also switched to the new shortened names.
But after an upgrade, existing shards still physically store indexes with the legacy format logicIndex_actualTableName. That means DROP INDEX or ALTER INDEX ... RENAME against pre-existing shards will target a non-existent new name and fail. The PR preserves backward recognition of legacy names, but it does not preserve forward compatibility for operating on legacy indexes. That regression needs to be fixed first.
Next Steps

Add an upgrade-compatible strategy first: keep legacy physical index name compatibility in forward routing, or provide an explicit migration/rename path before switching rewrite expectations.
Make actual index-name generation truly length-safe: reserve suffix budget per target dialect and truncate the logical-name prefix as needed, instead of only making it “shorter than before”.
Add one-to-one regression tests for legacy physical indexes with DROP INDEX and ALTER INDEX RENAME, plus near-limit cases for both named and anonymous indexes.

@ym0506
Copy link
Copy Markdown
Contributor Author

ym0506 commented Mar 21, 2026

Decision

Merge Verdict: Not Mergeable Reviewed Scope: latest PR head 3345885 dated March 16, 2026 for PR #38449, covering infra/common, features/sharding/core, features/sharding/dialect/mysql, kernel/data-pipeline/core, test/it/rewriter, plus issue #37123 and PR CI. Not Reviewed Scope: live PostgreSQL/openGauss cluster execution, full logs for all 91 CI jobs, and untouched modules outside the 11 changed files. Need Expert Review: PostgreSQL/openGauss DDL rewrite and upgrade-compatibility review after the next revision. Positive Feedback

The direction is correct. The backward-compatibility part is fairly complete: ShardingIndexReviser.java (line 43), MySQLShardingShowIndexMergedResult.java (line 55), MySQLShardingShowCreateTableMergedResult.java (line 84), and PipelineDDLDecorator.java (line 117) can all recognize both the new and legacy physical index name formats. The PR currently also has 91 GitHub check runs, all completed/success. Major Issues

[P1] The root cause is not actually fixed. In IndexMetaDataUtils.java (line 70), the PR only shortens the suffix into a hash, but it still unconditionally appends a suffix to the logical index name. There is no truncation or enforcement based on dialect-specific identifier limits. At the same time, CreateIndexStatement.java (line 133) can still generate very long logical names for anonymous indexes, and IndexMetaDataUtilsTest.java (line 70) only verifies “shorter than the old format”, not “within the database limit”. This does not satisfy the coverage expectation in CODE_OF_CONDUCT.md (line 20) and misses the boundary-testing requirement in CODE_OF_CONDUCT.md (line 90). For PostgreSQL/openGauss-style 63-byte identifiers, long logical index names can still fail with index name too long. Newly Introduced Issues

[P1] This PR breaks forward DDL compatibility for existing physical indexes. In IndexToken.java (line 72), forward routing now only generates the new shortened physical name. In index.xml (line 62), the PostgreSQL/openGauss expectations for ALTER INDEX RENAME and DROP INDEX were also switched to the new shortened names. But after an upgrade, existing shards still physically store indexes with the legacy format logicIndex_actualTableName. That means DROP INDEX or ALTER INDEX ... RENAME against pre-existing shards will target a non-existent new name and fail. The PR preserves backward recognition of legacy names, but it does not preserve forward compatibility for operating on legacy indexes. That regression needs to be fixed first. Next Steps

Add an upgrade-compatible strategy first: keep legacy physical index name compatibility in forward routing, or provide an explicit migration/rename path before switching rewrite expectations. Make actual index-name generation truly length-safe: reserve suffix budget per target dialect and truncate the logical-name prefix as needed, instead of only making it “shorter than before”. Add one-to-one regression tests for legacy physical indexes with DROP INDEX and ALTER INDEX RENAME, plus near-limit cases for both named and anonymous indexes.

Thanks for the review. I pushed a follow-up revision on top of the reviewed head to address the requested points.

This revision is intended to:

  • preserve forward DDL compatibility for existing physical indexes by preferring the legacy physical name when it still fits the target dialect limit
  • make actual index-name generation length-safe with dialect-aware identifier budgets and truncation/hash fallback
  • add regression coverage for legacy DROP INDEX / ALTER INDEX RENAME behavior and near-limit cases for both named and anonymous indexes

The remaining failing checks appear to be unrelated to this PR's code changes. They are failing during workflow bootstrap because graalvm/setup-graalvm@v1 and docker/setup-docker-action@v4 are currently not allowed by the repository Actions policy.

Could you please take another look at the latest head when convenient?

Copy link
Copy Markdown
Member

@terrymanu terrymanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decision

Merge Verdict: Not Mergeable
Reviewed Scope: latest PR head c565046 dated March 21, 2026 on PR #38449, covering infra/common, features/sharding/core, features/sharding/dialect/mysql, kernel/data-pipeline/core, and test/it/rewriter, plus issue #37123 and the current check state.
Not Reviewed Scope: live PostgreSQL/openGauss/Oracle pipeline execution, full end-to-end multi-schema migration fixtures, and the full logs for all GitHub checks.
Need Expert Review: PostgreSQL/openGauss/Oracle pipeline DDL semantics after the next revision.

This revision has already addressed the two core directions raised in the previous round: identifier length budgeting is now constrained by dialect, and legacy compatibility for DROP INDEX / ALTER INDEX RENAME has also been restored. Compared with the previous-round comment, this is substantial progress.

Major Issues

[P1] The shared reverse-lookup logic can mis-handle legitimate index names. In IndexMetaDataUtils.java (line 164), as soon as it sees a name ending with _h[0-9a-z]{8} or _t[0-9a-z]{8}, it prefers to strip the name down to its prefix first; MySQLShardingShowIndexMergedResult.java (line 53) then uses that result directly. The counterexample is straightforward: if the real logical index name in MySQL is foo_h12345678, the merged result will be incorrectly shown as foo. This is a semantic regression newly introduced in the latest head, and the current tests do not cover this adjacent valid input. That does not satisfy the coverage and boundary-scenario requirements in CODE_OF_CONDUCT.md (line 20) and CODE_OF_CONDUCT.md (line 90).

[P1] The pipeline candidate index lookup ignores schema shadowing. The newly added findCandidateLogicIndexNames in PipelineDDLDecorator.java (line 118) only uses targetTableName to locate the first matching schema in getAllSchemas(). For PostgreSQL/openGauss/Oracle-style scenarios where same-name tables can exist in multiple schemas, this can incorrectly use the index candidate set from schema A for schema B, causing the DDL decoration result to be mapped to the wrong logical index. This is exactly a lookup-order / shadowing counterexample, and the current PR does not provide a corresponding regression test.

Next Steps

First, make findGeneratedLogicIndexName conservative: check exact candidates first, then determine whether the name is actually a ShardingSphere-generated name; if that cannot be proven, return the original name instead of stripping the prefix first.
Carry schema identity through the pipeline path, and do not use “the first schema in the database that contains this table name” as the source of candidates.
Add one-to-one regression test cases:
MySQL SHOW INDEX / merged result where the logical index name itself ends with _hxxxxxxxx or _txxxxxxxx.
Pipeline DDL decoration for reverse lookup of same-name actual tables across multiple schemas.
At least one exact-name counterexample proving that a legitimate name will not be mistaken for a generated name.

Multi-Round Comparison

Fixed: the previously raised issues of “insufficient length safety” and “legacy DROP/ALTER compatibility regression” are basically closed in c565046.
Newly introduced / unresolved: the two P1 issues above still block merge.

@ym0506
Copy link
Copy Markdown
Contributor Author

ym0506 commented Mar 24, 2026

Thanks for the follow-up review. I pushed another revision on top of the reviewed head to address the two remaining P1 concerns.

This revision changes the reverse index lookup to prefer an exact candidate match first and only normalize names when the actual index name is provably generated by ShardingSphere, so logical names such as foo_h12345678 / foo_t12345678 are preserved. It also carries schema identity through the pipeline DDL reverse-mapping path and uses the target schema for candidate lookup, which avoids cross-schema shadowing when the same table name exists in multiple schemas.

I also added regressions for the exact-name suffix cases in the shared index lookup and MySQL merged result path, plus a pipeline test covering same-table-name multi-schema resolution. Local targeted verification passed with:

  • ./mvnw -pl infra/common,features/sharding/dialect/mysql,kernel/data-pipeline/core -am -DskipITs -Dsurefire.failIfNoSpecifiedTests=false -Dtest=IndexMetaDataUtilsTest,MySQLShardingShowIndexMergedResultTest,PipelineDDLDecoratorTest test

Could you please take another look at the latest head when convenient?

Copy link
Copy Markdown
Member

@terrymanu terrymanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge Verdict: Not Mergeable

Reviewed Scope: latest head fe137b4 on PR #38449, covering infra/common, features/sharding/core, features/sharding/dialect/mysql, kernel/data-pipeline/core, and test/it/rewriter, plus issue #37123 and current GitHub checks.
Not Reviewed Scope: live PostgreSQL/openGauss/Oracle end-to-end metadata loading on a real cluster, and modules outside the 14 changed PR files.
Need Expert Review: no extra expert is needed to confirm the current blocker; after the next revision, a PostgreSQL/openGauss metadata-loading reviewer would still be helpful.

This revision does close the two previous P1s in the right direction: exact-name preservation was added to the shared reverse lookup, and pipeline mapping is now schema-aware. The head also has 150 completed successful check runs. That progress is real, but it is still not enough for mergeability.

Major Issues

[P1] The root-cause repair is still incomplete on the shared metadata-revision path.
In ShardingIndexReviser.java lines 44-52, the PR still derives the logical index name with getGeneratedLogicIndexName(...) and only has a fallback for anonymous indexes reconstructed from columns. But the PR itself already proves truncated named indexes exist: see IndexMetaDataUtilsTest.java lines 117-138, where recovery for _txxxxxxxx names requires findGeneratedLogicIndexName(..., candidates). ShardingIndexReviser has no candidate set, so a long named index that is rewritten to a truncated physical name will still be revised back to only its truncated prefix, not the full logical name. Since the PR summary explicitly includes sharding metadata revision in scope, this means the fix chain is still incomplete. It also leaves a missing boundary regression relative to CODE_OF_CONDUCT.md (line 20) and CODE_OF_CONDUCT.md (line 90).
Newly Introduced Issues

[P1] The same shared path still regresses exact hash-like / truncation-like logical names outside the target dialect path.
ShardingMetaDataReviseEntry wires this reviser into the generic metadata revise engine for sharding tables, not a dialect-local path: ShardingMetaDataReviseEntry.java lines 54-55. At the same time, MySQL still keeps logical index names unchanged on the table-unique path, as shown by index.xml lines 20-23. That means a legitimate logical index such as foo_h12345678 or foo_t12345678 can still be revised to foo by ShardingIndexReviser, because this path does not do the exact-candidate-first check that the PR added for MySQL merged results and pipeline decoration. This is a shared-path blast-radius gap, and there is no regression test for it in ShardingIndexReviserTest.java.
Next-Step Suggestions

Please switch ShardingIndexReviser to the same candidate-based recovery strategy already used in MySQL merged result and pipeline decoration, instead of stripping _h/_t patterns directly.
Please add one-to-one regressions for:
a truncated named index on the metadata revise path;
an exact logical index name ending with _hxxxxxxxx or _txxxxxxxx on the MySQL/table-unique path;
one cross-dialect metadata-revise case proving the latest shared path preserves exact names and restores full long names.
After that revision, I can do another focused pass on the latest head.
Multi-Round Comparison

Fixed versus previous round: the MySQL merged-result exact-name issue and the pipeline schema-shadowing issue look closed on fe137b4.
Newly identified blocker on the latest head: the shared ShardingIndexReviser path above still prevents merge.

@ym0506
Copy link
Copy Markdown
Contributor Author

ym0506 commented Mar 25, 2026

Thanks for the detailed follow-up review.

I understand the remaining blocker is now the shared metadata revise path in ShardingIndexReviser, not the MySQL merged-result or pipeline paths.

I can make the exact-name _hxxxxxxxx / _txxxxxxxx handling conservative there as well, but for restoring a truncated named physical index back to its full logical name, the reviser currently does not have a candidate logical-index set available, unlike the MySQL merged-result and pipeline paths that already use candidate-based recovery.

Would you be OK with widening the metadata revise path so ShardingIndexReviser can receive candidate logical index names (or another equivalent source of truth) and use the same candidate-based recovery strategy there? Or do you expect a different approach for this shared revise path?

@ym0506
Copy link
Copy Markdown
Contributor Author

ym0506 commented Mar 31, 2026

Hi @terrymanu, just following up on my question from March 25.

Since the remaining blocker seems to depend on whether widening the shared metadata-revise path is acceptable, I wanted to confirm the intended direction before preparing the next revision.

The current PR head still has all checks passing. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve IndexMetaDataUtils.getActualIndexName to reduce index name too long problem

2 participants