-
Notifications
You must be signed in to change notification settings - Fork 29
Support for distributed migrations #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for distributed migrations #130
Conversation
Hi could you add some test for Undo a Migration That’s Already Been Applied when distributed_migrations is on |
@jayvynl ok, tests added. When running the whole test suite, i ran into failures on the tests_datetime.py, perhaps due to my location. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the pr, I have leave some comments, mainly for test improve. And don't forget write changelog.
I have confirmed the problem, but I am unable to find the cause. It's very strange, because distributed table have been tested in existing testcase, insertions and mutations can be queried immediatly on all nodes. |
@rcampos87 According to clickhouse distributed engine document, It's recommend to use replicated tables as the ubderlying table.
In clickhouse-config of this project, |
ah I see. thanks for looking into it @jayvynl |
@jayvynl using ReplicatedMergeTree seems to have solved the lag indeed. Added the tests you asked for too. RMT only works if there are replicas by what I read, so migrations still need to support MergeTree too. |
Ok, added a simple check for replicas. |
Main motivation for this PR is to fix the handling of migrations performed by django through a load balancer, which can lead to inconsistent results if a clickhouse cluster with multiple nodes is behind a load balancer and round-robin is in effect. By making migrations distributed, all nodes are aware of the migration data and we can have much more consistent results when running manage.py migrate. It also makes the process of distributing migrations data automatic. (See discussion #114)
When having
distributed_migrations
andmigration_cluster
set, new distributed and local tables will be created for migrations, and all migration querysets will be routed to the distributed table.In order to test the load balacing use case, a new docker compose service was added for HAProxy. For simplicity, already existent clickhouse nodes were used behind the HAProxy.
Example configuration would be
In my case, a clickhouse cluster with 3 nodes is behind an AWS ELB and everytime when running
makemigrations
ormigrate
, a different result could be achieved, and by using distributed migrations, all my issues were gone.