-
Notifications
You must be signed in to change notification settings - Fork 746
Disallow colocating tables when collations don't match #8257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8257 +/- ##
==========================================
- Coverage 88.79% 88.79% -0.01%
==========================================
Files 287 287
Lines 63236 63241 +5
Branches 7927 7927
==========================================
+ Hits 56151 56152 +1
- Misses 4754 4759 +5
+ Partials 2331 2330 -1 🚀 New features to boost your workflow:
|
e0f0438 to
48ac05e
Compare
8ffa0d5 to
c34b6a5
Compare
ab7953a to
f5099d8
Compare
Get ready for the next commit, could also be merged separately to the main.
- No need to make any changes for the following UDFs:
- update_distributed_table_colocation()/create_distributed_table_concurrently()
These are already checking whether collations of distribution keys match.
- ALTER TABLE .. ALTER COLUMN .. TYPE .. COLLATE ..
We already don't allow changing the collation of a distribution column.
- The following UDFs are fixed with this commit:
These UDFs were not explicitly checking whether collation of the distribution
keys of two distributed tables match, however, we were catching this via
EnsurePartitionMetadataIsSane() when syncing metadata. And this incorrectly
caused allowing to colocate two such distributed tables in a single-node setup
as we don't have any nodes to sync metadata there. In such cases, we would only
become aware of the situation **when adding a new node to the cluster etc.**.
Otherwise, i.e., if we already had more than one nodes in the cluster, we were
seeing this error while Citus is syncing metadata for the new / altered distributed
table, which wasn't too bad but not ideal too:
```sql
ERROR: cannot colocate tables test_a_tbl_2 and test_a_tbl_1
DETAIL: Distribution column collations don't match for test_a_tbl_2 and test_a_tbl_1.
CONTEXT: while executing command on localhost:9701
```
- create_distributed_table()
To fix that, now EnsureTableCanBeColocatedWith() calls EnsureColumnTypeEquality()
to perform necessary checks, which already ensures collation match.
And to do that, now EnsureTableCanBeColocatedWith() has to accept distributionColumn
Var as an argument (to be passed into EnsureColumnTypeEquality()), so now it doesn't
have to accept distributionColumnType separately, so refactor accordingly.
And to provide distributionColumn Var to EnsureTableCanBeColocatedWith(), similarly
the call sites of the function also need to have distributionColumn Var, like
FindColocateWithColocationId(). And now that FindColocateWithColocationId() accepts
distributionColumn Var, it doesn't have to accept distributionColumnType and
distributionColumnCollation separately, so refactor accordingly.
Plus, now this UDF also correctly records the colocation groups with the correct
distribution column collations.
- alter_distributed_table()
Make sure to check column collations.
- Also, now citus_finish_citus_upgrade() automatically fixes such colocation groups when
executed after upgrading to Citus 14.
This also helps removing the last alternative test output for
upgrade_citus_finish_citus_upgrade. This is because, the recent change made in
citus_finish_citus_upgrade() to automatically fix such colocation groups results in
updating last_upgrade_version in pg_dist_node_metadata when upgrading to Citus 14,
so the notice message in the mentioned test is now gone, so we don't anymore need to
have an alternative output for that test.
544f452 to
f89c279
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances Citus to disallow colocating distributed tables when their distribution key collations don't match, fixing issue #7683. Previously, collation mismatches were only caught during metadata synchronization to worker nodes, which meant single-node setups could create invalid colocation groups that would only be detected when adding new nodes.
Key changes:
- Refactored colocation validation logic to check both type and collation equality through
EnsureColumnTypeEquality() - Added
fix_pre_citus14_colocation_group_collation_mismatches()UDF to automatically fix invalid colocation groups when upgrading to Citus 14 - Enhanced
alter_distributed_table()to validate collation compatibility when changing colocation or distribution columns
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/include/distributed/colocation_utils.h |
Updated function signatures to pass Var *distributionColumn instead of separate type and collation parameters |
src/backend/distributed/utils/colocation_utils.c |
Refactored FindColocateWithColocationId() and EnsureTableCanBeColocatedWith() to use EnsureColumnTypeEquality() for comprehensive type and collation checking |
src/backend/distributed/commands/create_distributed_table.c |
Updated to pass distribution column Var to colocation functions and use varcollid instead of get_typcollation() for consistency |
src/backend/distributed/commands/alter_table.c |
Added collation validation checks in CheckAlterDistributedTableConversionParameters() and fixed typo from "then" to "than" in error message |
src/backend/distributed/sql/udfs/fix_pre_citus14_colocation_group_collation_mismatches/latest.sql |
New UDF to identify and fix tables with mismatched colocation group collations by moving them to appropriate colocation groups |
src/backend/distributed/sql/udfs/fix_pre_citus14_colocation_group_collation_mismatches/14.0-1.sql |
Version-specific implementation of the collation fix UDF for Citus 14.0-1 |
src/backend/distributed/sql/udfs/citus_finish_citus_upgrade/latest.sql |
Updated to call the collation fix function when upgrading from pre-14 versions |
src/backend/distributed/sql/udfs/citus_finish_citus_upgrade/14.0-1.sql |
Version-specific upgrade procedure that includes collation mismatch fixes |
src/backend/distributed/sql/citus--13.2-1--14.0-1.sql |
Upgrade script that includes the new collation fix UDF |
src/backend/distributed/sql/downgrades/citus--14.0-1--13.2-1.sql |
Downgrade script that removes the collation fix UDF |
src/test/regress/sql/upgrade_post_14_before.sql |
New test creating tables with different collations in older Citus versions to verify the upgrade fixes them |
src/test/regress/sql/upgrade_post_14_after.sql |
Verifies that collation groups are correctly adjusted after upgrade |
src/test/regress/sql/update_colocation_mx.sql |
Added comprehensive tests for collation mismatch detection in multi-node environments |
src/test/regress/sql/single_node.sql |
Added tests verifying collation checks work correctly in single-node setups |
src/test/regress/expected/*.out |
Updated expected outputs for all modified tests including removal of alternative outputs that are no longer needed |
src/test/regress/before_citus_upgrade_coord_schedule |
Added upgrade_post_14_before test to upgrade test schedule |
src/test/regress/after_citus_upgrade_coord_schedule |
Added upgrade_post_14_after test to upgrade test schedule |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
DESCRIPTION: Disallows creating a distributed table or altering it to be colocated with another table if distribution key collations don't match.
Fixes #7683.
No need to make any changes for the following UDFs:
update_distributed_table_colocation()/create_distributed_table_concurrently()
These are already checking whether collations of distribution keys match.
ALTER TABLE .. ALTER COLUMN .. TYPE .. COLLATE ..
We already don't allow changing the collation of a distribution column.
The following UDFs are fixed with this commit:
These UDFs were not explicitly checking whether collation of the distribution
keys of two distributed tables match, however, we were catching this via
EnsurePartitionMetadataIsSane() when syncing metadata. And this incorrectly
caused allowing to colocate two such distributed tables in a single-node setup
as we don't have any nodes to sync metadata there. In such cases, we would only
become aware of the situation when adding a new node to the cluster etc..
Otherwise, i.e., if we already had more than one nodes in the cluster, we were
seeing this error while Citus is syncing metadata for the new / altered distributed
table, which wasn't too bad but not ideal too:
create_distributed_table()
To fix that, now EnsureTableCanBeColocatedWith() calls EnsureColumnTypeEquality()
to perform necessary checks, which already ensures collation match.
And to do that, now EnsureTableCanBeColocatedWith() has to accept distributionColumn
Var as an argument (to be passed into EnsureColumnTypeEquality()), so now it doesn't
have to accept distributionColumnType separately, so refactor accordingly.
And to provide distributionColumn Var to EnsureTableCanBeColocatedWith(), similarly
the call sites of the function also need to have distributionColumn Var, like
FindColocateWithColocationId(). And now that FindColocateWithColocationId() accepts
distributionColumn Var, it doesn't have to accept distributionColumnType and
distributionColumnCollation separately, so refactor accordingly.
Plus, now this UDF also correctly records the colocation groups with the correct
distribution column collations.
alter_distributed_table()
Make sure to check column collations.
Also added fix_pre_citus14_colocation_group_collation_mismatches(), which is
automatically called by citus_finish_citus_upgrade() to automatically fix such
colocation groups when executed after upgrading to Citus 14.
Tested fix_pre_citus14_colocation_group_collation_mismatches() on a 3-node
"Standard D4ds_v5" cluster, and;
Moving 10k tables out from their original colocation groups to fix the bug for an
existing cluster took 4.5 sec for a 3-node "Standard D4ds_v5" cluster. (#nodes
matters because the UDF syncs the corrected colocation entries to all nodes).
For 10k tables, in the same cluster setup, when there is nothing to fix, the
function returns at 200 ms.
This also helps removing the last alternative test output for
upgrade_citus_finish_citus_upgrade. This is because, the recent change made in
citus_finish_citus_upgrade() to automatically fix such colocation groups results in
updating last_upgrade_version in pg_dist_node_metadata when upgrading to Citus 14,
so the notice message in the mentioned test is now gone, so we don't anymore need to
have an alternative output for that test.