Skip to content

Conversation

@onurctirtir
Copy link
Member

@onurctirtir onurctirtir commented Oct 16, 2025

DESCRIPTION: Disallows creating a distributed table or altering it to be colocated with another table if distribution key collations don't match.

Fixes #7683.

  • No need to make any changes for the following UDFs:

    • update_distributed_table_colocation()/create_distributed_table_concurrently()
      These are already checking whether collations of distribution keys match.

    • ALTER TABLE .. ALTER COLUMN .. TYPE .. COLLATE ..
      We already don't allow changing the collation of a distribution column.

  • The following UDFs are fixed with this commit:
    These UDFs were not explicitly checking whether collation of the distribution
    keys of two distributed tables match, however, we were catching this via
    EnsurePartitionMetadataIsSane() when syncing metadata. And this incorrectly
    caused allowing to colocate two such distributed tables in a single-node setup
    as we don't have any nodes to sync metadata there. In such cases, we would only
    become aware of the situation when adding a new node to the cluster etc..

    Otherwise, i.e., if we already had more than one nodes in the cluster, we were
    seeing this error while Citus is syncing metadata for the new / altered distributed
    table, which wasn't too bad but not ideal too:

    ERROR:  cannot colocate tables test_a_tbl_2 and test_a_tbl_1
    DETAIL:  Distribution column collations don't match for test_a_tbl_2 and test_a_tbl_1.
    CONTEXT:  while executing command on localhost:9701
    • create_distributed_table()
      To fix that, now EnsureTableCanBeColocatedWith() calls EnsureColumnTypeEquality()
      to perform necessary checks, which already ensures collation match.

      And to do that, now EnsureTableCanBeColocatedWith() has to accept distributionColumn
      Var as an argument (to be passed into EnsureColumnTypeEquality()), so now it doesn't
      have to accept distributionColumnType separately, so refactor accordingly.

      And to provide distributionColumn Var to EnsureTableCanBeColocatedWith(), similarly
      the call sites of the function also need to have distributionColumn Var, like
      FindColocateWithColocationId(). And now that FindColocateWithColocationId() accepts
      distributionColumn Var, it doesn't have to accept distributionColumnType and
      distributionColumnCollation separately, so refactor accordingly.

      Plus, now this UDF also correctly records the colocation groups with the correct
      distribution column collations.

    • alter_distributed_table()
      Make sure to check column collations.

  • Also added fix_pre_citus14_colocation_group_collation_mismatches(), which is
    automatically called by citus_finish_citus_upgrade() to automatically fix such
    colocation groups when executed after upgrading to Citus 14.

    Tested fix_pre_citus14_colocation_group_collation_mismatches() on a 3-node
    "Standard D4ds_v5" cluster, and;

    • Moving 10k tables out from their original colocation groups to fix the bug for an
      existing cluster took 4.5 sec for a 3-node "Standard D4ds_v5" cluster. (#nodes
      matters because the UDF syncs the corrected colocation entries to all nodes).

    • For 10k tables, in the same cluster setup, when there is nothing to fix, the
      function returns at 200 ms.

    This also helps removing the last alternative test output for
    upgrade_citus_finish_citus_upgrade. This is because, the recent change made in
    citus_finish_citus_upgrade() to automatically fix such colocation groups results in
    updating last_upgrade_version in pg_dist_node_metadata when upgrading to Citus 14,
    so the notice message in the mentioned test is now gone, so we don't anymore need to
    have an alternative output for that test.

@codecov
Copy link

codecov bot commented Oct 16, 2025

Codecov Report

❌ Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.79%. Comparing base (851c9cd) to head (f89c279).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8257      +/-   ##
==========================================
- Coverage   88.79%   88.79%   -0.01%     
==========================================
  Files         287      287              
  Lines       63236    63241       +5     
  Branches     7927     7927              
==========================================
+ Hits        56151    56152       +1     
- Misses       4754     4759       +5     
+ Partials     2331     2330       -1     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@onurctirtir onurctirtir force-pushed the fix-collation-mismatch-create-dist-table branch from e0f0438 to 48ac05e Compare October 16, 2025 15:04
@onurctirtir onurctirtir changed the title Disallow creating a distributed table colocated with another table if distribution key collations don't match Disallow creating a distributed table or altering it to be colocated with another table if distribution key collations don't match Oct 16, 2025
@onurctirtir onurctirtir changed the title Disallow creating a distributed table or altering it to be colocated with another table if distribution key collations don't match create/alter_distributed_table: disallow colocating tables when collations mismatch Oct 16, 2025
@onurctirtir onurctirtir changed the title create/alter_distributed_table: disallow colocating tables when collations mismatch create/alter_distributed_table: disallow colocating tables when collations don't match Oct 16, 2025
@onurctirtir onurctirtir changed the title create/alter_distributed_table: disallow colocating tables when collations don't match Disallow colocating tables when collations don't match Oct 17, 2025
@onurctirtir onurctirtir force-pushed the fix-collation-mismatch-create-dist-table branch 2 times, most recently from 8ffa0d5 to c34b6a5 Compare October 17, 2025 09:54
@onurctirtir onurctirtir enabled auto-merge (squash) October 17, 2025 10:16
@onurctirtir onurctirtir requested a review from naisila October 17, 2025 10:24
@onurctirtir onurctirtir disabled auto-merge October 17, 2025 12:09
@onurctirtir onurctirtir enabled auto-merge (squash) October 17, 2025 12:13
@onurctirtir onurctirtir disabled auto-merge October 22, 2025 09:49
@onurctirtir onurctirtir force-pushed the fix-collation-mismatch-create-dist-table branch 5 times, most recently from ab7953a to f5099d8 Compare December 24, 2025 15:06
Get ready for the next commit, could also be merged separately to the main.
- No need to make any changes for the following UDFs:

  - update_distributed_table_colocation()/create_distributed_table_concurrently()

    These are already checking whether collations of distribution keys match.

  - ALTER TABLE .. ALTER COLUMN .. TYPE .. COLLATE ..

    We already don't allow changing the collation of a distribution column.

- The following UDFs are fixed with this commit:

  These UDFs were not explicitly checking whether collation of the distribution
  keys of two distributed tables match, however, we were catching this via
  EnsurePartitionMetadataIsSane() when syncing metadata. And this incorrectly
  caused allowing to colocate two such distributed tables in a single-node setup
  as we don't have any nodes to sync metadata there. In such cases, we would only
  become aware of the situation **when adding a new node to the cluster etc.**.

  Otherwise, i.e., if we already had more than one nodes in the cluster, we were
  seeing this error while Citus is syncing metadata for the new / altered distributed
  table, which wasn't too bad but not ideal too:
  ```sql
  ERROR:  cannot colocate tables test_a_tbl_2 and test_a_tbl_1
  DETAIL:  Distribution column collations don't match for test_a_tbl_2 and test_a_tbl_1.
  CONTEXT:  while executing command on localhost:9701
  ```

  - create_distributed_table()

    To fix that, now EnsureTableCanBeColocatedWith() calls EnsureColumnTypeEquality()
    to perform necessary checks, which already ensures collation match.

    And to do that, now EnsureTableCanBeColocatedWith() has to accept distributionColumn
    Var as an argument (to be passed into EnsureColumnTypeEquality()), so now it doesn't
    have to accept distributionColumnType separately, so refactor accordingly.

    And to provide distributionColumn Var to EnsureTableCanBeColocatedWith(), similarly
    the call sites of the function also need to have distributionColumn Var, like
    FindColocateWithColocationId(). And now that FindColocateWithColocationId() accepts
    distributionColumn Var, it doesn't have to accept distributionColumnType and
    distributionColumnCollation separately, so refactor accordingly.

    Plus, now this UDF also correctly records the colocation groups with the correct
    distribution column collations.

  - alter_distributed_table()

    Make sure to check column collations.

- Also, now citus_finish_citus_upgrade() automatically fixes such colocation groups when
  executed after upgrading to Citus 14.

  This also helps removing the last alternative test output for
  upgrade_citus_finish_citus_upgrade. This is because, the recent change made in
  citus_finish_citus_upgrade() to automatically fix such colocation groups results in
  updating last_upgrade_version in pg_dist_node_metadata when upgrading to Citus 14,
  so the notice message in the mentioned test is now gone, so we don't anymore need to
  have an alternative output for that test.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances Citus to disallow colocating distributed tables when their distribution key collations don't match, fixing issue #7683. Previously, collation mismatches were only caught during metadata synchronization to worker nodes, which meant single-node setups could create invalid colocation groups that would only be detected when adding new nodes.

Key changes:

  • Refactored colocation validation logic to check both type and collation equality through EnsureColumnTypeEquality()
  • Added fix_pre_citus14_colocation_group_collation_mismatches() UDF to automatically fix invalid colocation groups when upgrading to Citus 14
  • Enhanced alter_distributed_table() to validate collation compatibility when changing colocation or distribution columns

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/include/distributed/colocation_utils.h Updated function signatures to pass Var *distributionColumn instead of separate type and collation parameters
src/backend/distributed/utils/colocation_utils.c Refactored FindColocateWithColocationId() and EnsureTableCanBeColocatedWith() to use EnsureColumnTypeEquality() for comprehensive type and collation checking
src/backend/distributed/commands/create_distributed_table.c Updated to pass distribution column Var to colocation functions and use varcollid instead of get_typcollation() for consistency
src/backend/distributed/commands/alter_table.c Added collation validation checks in CheckAlterDistributedTableConversionParameters() and fixed typo from "then" to "than" in error message
src/backend/distributed/sql/udfs/fix_pre_citus14_colocation_group_collation_mismatches/latest.sql New UDF to identify and fix tables with mismatched colocation group collations by moving them to appropriate colocation groups
src/backend/distributed/sql/udfs/fix_pre_citus14_colocation_group_collation_mismatches/14.0-1.sql Version-specific implementation of the collation fix UDF for Citus 14.0-1
src/backend/distributed/sql/udfs/citus_finish_citus_upgrade/latest.sql Updated to call the collation fix function when upgrading from pre-14 versions
src/backend/distributed/sql/udfs/citus_finish_citus_upgrade/14.0-1.sql Version-specific upgrade procedure that includes collation mismatch fixes
src/backend/distributed/sql/citus--13.2-1--14.0-1.sql Upgrade script that includes the new collation fix UDF
src/backend/distributed/sql/downgrades/citus--14.0-1--13.2-1.sql Downgrade script that removes the collation fix UDF
src/test/regress/sql/upgrade_post_14_before.sql New test creating tables with different collations in older Citus versions to verify the upgrade fixes them
src/test/regress/sql/upgrade_post_14_after.sql Verifies that collation groups are correctly adjusted after upgrade
src/test/regress/sql/update_colocation_mx.sql Added comprehensive tests for collation mismatch detection in multi-node environments
src/test/regress/sql/single_node.sql Added tests verifying collation checks work correctly in single-node setups
src/test/regress/expected/*.out Updated expected outputs for all modified tests including removal of alternative outputs that are no longer needed
src/test/regress/before_citus_upgrade_coord_schedule Added upgrade_post_14_before test to upgrade test schedule
src/test/regress/after_citus_upgrade_coord_schedule Added upgrade_post_14_after test to upgrade test schedule

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

unable to add node when distribution columns have different collation types

3 participants