Skip to content

Conversation

rcampos87
Copy link
Contributor

Main motivation for this PR is to fix the handling of migrations performed by django through a load balancer, which can lead to inconsistent results if a clickhouse cluster with multiple nodes is behind a load balancer and round-robin is in effect. By making migrations distributed, all nodes are aware of the migration data and we can have much more consistent results when running manage.py migrate. It also makes the process of distributing migrations data automatic. (See discussion #114)

When having distributed_migrations and migration_cluster set, new distributed and local tables will be created for migrations, and all migration querysets will be routed to the distributed table.

In order to test the load balacing use case, a new docker compose service was added for HAProxy. For simplicity, already existent clickhouse nodes were used behind the HAProxy.

Example configuration would be

{
        "ENGINE": "clickhouse_backend.backend",
        "HOST": "load-balancer.dns",
        "PORT": 9004,
        ....
        "OPTIONS": {
            "distributed_migrations": True,
            "migration_cluster": "cluster",
         }
}

In my case, a clickhouse cluster with 3 nodes is behind an AWS ELB and everytime when running makemigrations or migrate, a different result could be achieved, and by using distributed migrations, all my issues were gone.

@jayvynl
Copy link
Owner

jayvynl commented Aug 22, 2025

Hi could you add some test for Undo a Migration That’s Already Been Applied

when distributed_migrations is on

@rcampos87
Copy link
Contributor Author

@jayvynl ok, tests added. When running the whole test suite, i ran into failures on the tests_datetime.py, perhaps due to my location.

Copy link
Owner

@jayvynl jayvynl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the pr, I have leave some comments, mainly for test improve. And don't forget write changelog.

@jayvynl
Copy link
Owner

jayvynl commented Aug 28, 2025

I have confirmed the problem, but I am unable to find the cause. It's very strange, because distributed table have been tested in existing testcase, insertions and mutations can be queried immediatly on all nodes.

@jayvynl
Copy link
Owner

jayvynl commented Sep 5, 2025

@rcampos87 According to clickhouse distributed engine document, It's recommend to use replicated tables as the ubderlying table.

If internal_replication is set to false (the default), data is written to all replicas. In this case, the Distributed table replicates data itself. This is worse than using replicated tables because the consistency of replicas is not checked and, over time, they will contain slightly different data.

In clickhouse-config of this project, internal_replication is set to true, if you use plain MergeTree as the underlying table, the Distributed table replicates data itself. , so the lag occurrs.

@rcampos87
Copy link
Contributor Author

ah I see. thanks for looking into it @jayvynl

@rcampos87
Copy link
Contributor Author

rcampos87 commented Sep 8, 2025

@jayvynl using ReplicatedMergeTree seems to have solved the lag indeed. Added the tests you asked for too. RMT only works if there are replicas by what I read, so migrations still need to support MergeTree too.

@rcampos87 rcampos87 requested a review from jayvynl September 8, 2025 19:17
@rcampos87
Copy link
Contributor Author

Ok, added a simple check for replicas.

@jayvynl jayvynl merged commit 0abe336 into jayvynl:main Sep 18, 2025
60 checks passed
@rcampos87 rcampos87 deleted the feature/distributed-migrations branch September 18, 2025 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants