Improve generated sharding index names to reduce name inflation#38449
Improve generated sharding index names to reduce name inflation#38449ym0506 wants to merge 8 commits intoapache:masterfrom
Conversation
terrymanu
left a comment
There was a problem hiding this comment.
Decision
Merge Verdict: Not Mergeable
Reviewed Scope: latest PR head 3345885 dated March 16, 2026 for PR #38449, covering infra/common, features/sharding/core, features/sharding/dialect/mysql, kernel/data-pipeline/core, test/it/rewriter, plus issue #37123 and PR CI.
Not Reviewed Scope: live PostgreSQL/openGauss cluster execution, full logs for all 91 CI jobs, and untouched modules outside the 11 changed files.
Need Expert Review: PostgreSQL/openGauss DDL rewrite and upgrade-compatibility review after the next revision.
Positive Feedback
The direction is correct. The backward-compatibility part is fairly complete: ShardingIndexReviser.java (line 43), MySQLShardingShowIndexMergedResult.java (line 55), MySQLShardingShowCreateTableMergedResult.java (line 84), and PipelineDDLDecorator.java (line 117) can all recognize both the new and legacy physical index name formats.
The PR currently also has 91 GitHub check runs, all completed/success.
Major Issues
[P1] The root cause is not actually fixed.
In IndexMetaDataUtils.java (line 70), the PR only shortens the suffix into a hash, but it still unconditionally appends a suffix to the logical index name. There is no truncation or enforcement based on dialect-specific identifier limits.
At the same time, CreateIndexStatement.java (line 133) can still generate very long logical names for anonymous indexes, and IndexMetaDataUtilsTest.java (line 70) only verifies “shorter than the old format”, not “within the database limit”.
This does not satisfy the coverage expectation in CODE_OF_CONDUCT.md (line 20) and misses the boundary-testing requirement in CODE_OF_CONDUCT.md (line 90). For PostgreSQL/openGauss-style 63-byte identifiers, long logical index names can still fail with index name too long.
Newly Introduced Issues
[P1] This PR breaks forward DDL compatibility for existing physical indexes.
In IndexToken.java (line 72), forward routing now only generates the new shortened physical name. In index.xml (line 62), the PostgreSQL/openGauss expectations for ALTER INDEX RENAME and DROP INDEX were also switched to the new shortened names.
But after an upgrade, existing shards still physically store indexes with the legacy format logicIndex_actualTableName. That means DROP INDEX or ALTER INDEX ... RENAME against pre-existing shards will target a non-existent new name and fail. The PR preserves backward recognition of legacy names, but it does not preserve forward compatibility for operating on legacy indexes. That regression needs to be fixed first.
Next Steps
Add an upgrade-compatible strategy first: keep legacy physical index name compatibility in forward routing, or provide an explicit migration/rename path before switching rewrite expectations.
Make actual index-name generation truly length-safe: reserve suffix budget per target dialect and truncate the logical-name prefix as needed, instead of only making it “shorter than before”.
Add one-to-one regression tests for legacy physical indexes with DROP INDEX and ALTER INDEX RENAME, plus near-limit cases for both named and anonymous indexes.
Thanks for the review. I pushed a follow-up revision on top of the reviewed head to address the requested points. This revision is intended to:
The remaining failing checks appear to be unrelated to this PR's code changes. They are failing during workflow bootstrap because Could you please take another look at the latest head when convenient? |
terrymanu
left a comment
There was a problem hiding this comment.
Decision
Merge Verdict: Not Mergeable
Reviewed Scope: latest PR head c565046 dated March 21, 2026 on PR #38449, covering infra/common, features/sharding/core, features/sharding/dialect/mysql, kernel/data-pipeline/core, and test/it/rewriter, plus issue #37123 and the current check state.
Not Reviewed Scope: live PostgreSQL/openGauss/Oracle pipeline execution, full end-to-end multi-schema migration fixtures, and the full logs for all GitHub checks.
Need Expert Review: PostgreSQL/openGauss/Oracle pipeline DDL semantics after the next revision.
This revision has already addressed the two core directions raised in the previous round: identifier length budgeting is now constrained by dialect, and legacy compatibility for DROP INDEX / ALTER INDEX RENAME has also been restored. Compared with the previous-round comment, this is substantial progress.
Major Issues
[P1] The shared reverse-lookup logic can mis-handle legitimate index names. In IndexMetaDataUtils.java (line 164), as soon as it sees a name ending with _h[0-9a-z]{8} or _t[0-9a-z]{8}, it prefers to strip the name down to its prefix first; MySQLShardingShowIndexMergedResult.java (line 53) then uses that result directly. The counterexample is straightforward: if the real logical index name in MySQL is foo_h12345678, the merged result will be incorrectly shown as foo. This is a semantic regression newly introduced in the latest head, and the current tests do not cover this adjacent valid input. That does not satisfy the coverage and boundary-scenario requirements in CODE_OF_CONDUCT.md (line 20) and CODE_OF_CONDUCT.md (line 90).
[P1] The pipeline candidate index lookup ignores schema shadowing. The newly added findCandidateLogicIndexNames in PipelineDDLDecorator.java (line 118) only uses targetTableName to locate the first matching schema in getAllSchemas(). For PostgreSQL/openGauss/Oracle-style scenarios where same-name tables can exist in multiple schemas, this can incorrectly use the index candidate set from schema A for schema B, causing the DDL decoration result to be mapped to the wrong logical index. This is exactly a lookup-order / shadowing counterexample, and the current PR does not provide a corresponding regression test.
Next Steps
First, make findGeneratedLogicIndexName conservative: check exact candidates first, then determine whether the name is actually a ShardingSphere-generated name; if that cannot be proven, return the original name instead of stripping the prefix first.
Carry schema identity through the pipeline path, and do not use “the first schema in the database that contains this table name” as the source of candidates.
Add one-to-one regression test cases:
MySQL SHOW INDEX / merged result where the logical index name itself ends with _hxxxxxxxx or _txxxxxxxx.
Pipeline DDL decoration for reverse lookup of same-name actual tables across multiple schemas.
At least one exact-name counterexample proving that a legitimate name will not be mistaken for a generated name.
Multi-Round Comparison
Fixed: the previously raised issues of “insufficient length safety” and “legacy DROP/ALTER compatibility regression” are basically closed in c565046.
Newly introduced / unresolved: the two P1 issues above still block merge.
|
Thanks for the follow-up review. I pushed another revision on top of the reviewed head to address the two remaining P1 concerns. This revision changes the reverse index lookup to prefer an exact candidate match first and only normalize names when the actual index name is provably generated by ShardingSphere, so logical names such as I also added regressions for the exact-name suffix cases in the shared index lookup and MySQL merged result path, plus a pipeline test covering same-table-name multi-schema resolution. Local targeted verification passed with:
Could you please take another look at the latest head when convenient? |
terrymanu
left a comment
There was a problem hiding this comment.
Merge Verdict: Not Mergeable
Reviewed Scope: latest head fe137b4 on PR #38449, covering infra/common, features/sharding/core, features/sharding/dialect/mysql, kernel/data-pipeline/core, and test/it/rewriter, plus issue #37123 and current GitHub checks.
Not Reviewed Scope: live PostgreSQL/openGauss/Oracle end-to-end metadata loading on a real cluster, and modules outside the 14 changed PR files.
Need Expert Review: no extra expert is needed to confirm the current blocker; after the next revision, a PostgreSQL/openGauss metadata-loading reviewer would still be helpful.
This revision does close the two previous P1s in the right direction: exact-name preservation was added to the shared reverse lookup, and pipeline mapping is now schema-aware. The head also has 150 completed successful check runs. That progress is real, but it is still not enough for mergeability.
Major Issues
[P1] The root-cause repair is still incomplete on the shared metadata-revision path.
In ShardingIndexReviser.java lines 44-52, the PR still derives the logical index name with getGeneratedLogicIndexName(...) and only has a fallback for anonymous indexes reconstructed from columns. But the PR itself already proves truncated named indexes exist: see IndexMetaDataUtilsTest.java lines 117-138, where recovery for _txxxxxxxx names requires findGeneratedLogicIndexName(..., candidates). ShardingIndexReviser has no candidate set, so a long named index that is rewritten to a truncated physical name will still be revised back to only its truncated prefix, not the full logical name. Since the PR summary explicitly includes sharding metadata revision in scope, this means the fix chain is still incomplete. It also leaves a missing boundary regression relative to CODE_OF_CONDUCT.md (line 20) and CODE_OF_CONDUCT.md (line 90).
Newly Introduced Issues
[P1] The same shared path still regresses exact hash-like / truncation-like logical names outside the target dialect path.
ShardingMetaDataReviseEntry wires this reviser into the generic metadata revise engine for sharding tables, not a dialect-local path: ShardingMetaDataReviseEntry.java lines 54-55. At the same time, MySQL still keeps logical index names unchanged on the table-unique path, as shown by index.xml lines 20-23. That means a legitimate logical index such as foo_h12345678 or foo_t12345678 can still be revised to foo by ShardingIndexReviser, because this path does not do the exact-candidate-first check that the PR added for MySQL merged results and pipeline decoration. This is a shared-path blast-radius gap, and there is no regression test for it in ShardingIndexReviserTest.java.
Next-Step Suggestions
Please switch ShardingIndexReviser to the same candidate-based recovery strategy already used in MySQL merged result and pipeline decoration, instead of stripping _h/_t patterns directly.
Please add one-to-one regressions for:
a truncated named index on the metadata revise path;
an exact logical index name ending with _hxxxxxxxx or _txxxxxxxx on the MySQL/table-unique path;
one cross-dialect metadata-revise case proving the latest shared path preserves exact names and restores full long names.
After that revision, I can do another focused pass on the latest head.
Multi-Round Comparison
Fixed versus previous round: the MySQL merged-result exact-name issue and the pipeline schema-shadowing issue look closed on fe137b4.
Newly identified blocker on the latest head: the shared ShardingIndexReviser path above still prevents merge.
|
Thanks for the detailed follow-up review. I understand the remaining blocker is now the shared metadata revise path in I can make the exact-name Would you be OK with widening the metadata revise path so |
|
Hi @terrymanu, just following up on my question from March 25. Since the remaining blocker seems to depend on whether widening the shared metadata-revise path is acceptable, I wanted to confirm the intended direction before preparing the next revision. The current PR head still has all checks passing. Thanks. |
Summary
This change reduces generated sharding index name inflation while keeping reverse mapping backward compatible.
_<actualTableName>formatFixes #37123
Tests
./mvnw -pl infra/common,features/sharding/core,features/sharding/dialect/mysql -am -DskipITs -Dsurefire.failIfNoSpecifiedTests=false -Dtest=IndexMetaDataUtilsTest,IndexTokenTest,ShardingIndexReviserTest,MySQLShardingShowIndexMergedResultTest,MySQLShardingShowCreateTableMergedResultTest test./mvnw -pl kernel/data-pipeline/core -am -DskipITs -DskipTests test-compile