Skip to content

Commit 8d8c456

Browse files
authored
Feat distributed (#38)
- Support [MergeTree settings](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#settings) in creating table. - Support [distributed DDL](https://clickhouse.com/docs/en/sql-reference/distributed-ddl) and [distributed table](https://clickhouse.com/docs/en/engines/table-engines/special/distributed). - Support create migration table and run migrating on cluster. - Fix bug: exception is raised when insert data with expression values. - Fix bug: exception is raised when alter field from not null to null. - Support escaping dict data.
1 parent 5e5c32e commit 8d8c456

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+2704
-245
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
11
### 1.1.1
22
- [Black](https://github.com/psf/black) code style.
33
- Support [MergeTree settings](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#settings) in creating table.
4+
- Support [distributed DDL](https://clickhouse.com/docs/en/sql-reference/distributed-ddl) and [distributed table](https://clickhouse.com/docs/en/engines/table-engines/special/distributed).
5+
- Support create migration table and run migrating on cluster.
6+
- Fix bug: exception is raised when insert data with expression values.
7+
- Fix bug: exception is raised when alter field from not null to null.
8+
- Support escaping dict data.
49

510
### 1.1.0
611
- Change `AutoFiled` and `SmallAutoField` to clickhouse `Int64`, so that id worker can generate value for them.

README.md

Lines changed: 219 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@ Django ClickHouse Database Backend
22
===
33
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
44

5-
Django clickhouse backend is a [django database backend](https://docs.djangoproject.com/en/4.1/ref/databases/) for
6-
[clickhouse](https://clickhouse.com/docs/en/home/) database. This project allows using django ORM to interact with
5+
Django clickhouse backend is a [django database backend](https://docs.djangoproject.com/en/4.1/ref/databases/) for
6+
[clickhouse](https://clickhouse.com/docs/en/home/) database. This project allows using django ORM to interact with
77
clickhouse, the goal of the project is to operate clickhouse like operating mysql, postgresql in django.
88

99
Thanks to [clickhouse driver](https://github.com/mymarilyn/clickhouse-driver), django clickhouse backend use it as [DBAPI](https://peps.python.org/pep-0249/).
@@ -26,11 +26,12 @@ Read [Documentation](https://github.com/jayvynl/django-clickhouse-backend/blob/m
2626

2727
- Not tested upon all versions of clickhouse-server, clickhouse-server 22.x.y.z or over is suggested.
2828
- Aggregation functions result in 0 or nan (Not NULL) when data set is empty. max/min/sum/count is 0, avg/STDDEV_POP/VAR_POP is nan.
29-
- In outer join, clickhouse will set missing columns to empty values (0 for number, empty string for text, unix epoch for date/datatime) instead of NULL.
29+
- In outer join, clickhouse will set missing columns to empty values (0 for number, empty string for text, unix epoch for date/datatime) instead of NULL.
3030
So Count("book") resolve to 1 in a missing LEFT OUTER JOIN match, not 0.
3131
In aggregation expression Avg("book__rating", default=2.5), default=2.5 have no effect in a missing match.
3232
- Clickhouse does not support unique constraint and foreignkey constraint. `ForeignKey`, `ManyToManyField` and `OneToOneField` can be used with clickhouse backend, but no database level constraints will be added, so there could be some consistency problems.
3333
- Clickhouse does not support transaction. If any exception occurs during migrating, then your clickhouse database will be in an untracked state. Any migration should be full tested in test environment before deployed to production environment.
34+
- This project does not support migrations of changing table engine and settings yet.
3435

3536
**Requirements:**
3637

@@ -75,28 +76,29 @@ Here I give an example setting for clickhouse and postgresql.
7576

7677
```python
7778
DATABASES = {
78-
'default': {
79-
'ENGINE': 'django.db.backends.postgresql',
80-
'HOST': 'localhost',
81-
'USER': 'postgres',
82-
'PASSWORD': '123456',
83-
'NAME': 'postgres',
79+
"default": {
80+
"ENGINE": "django.db.backends.postgresql",
81+
"HOST": "localhost",
82+
"USER": "postgres",
83+
"PASSWORD": "123456",
84+
"NAME": "postgres",
8485
},
85-
'clickhouse': {
86-
'ENGINE': 'clickhouse_backend.backend',
87-
'NAME': 'default',
88-
'HOST': 'localhost',
89-
'USER': 'DB_USER',
90-
'PASSWORD': 'DB_PASSWORD',
86+
"clickhouse": {
87+
"ENGINE": "clickhouse_backend.backend",
88+
"NAME": "default",
89+
"HOST": "localhost",
90+
"USER": "DB_USER",
91+
"PASSWORD": "DB_PASSWORD",
9192
}
9293
}
93-
DATABASE_ROUTERS = ['dbrouters.ClickHouseRouter']
94+
DATABASE_ROUTERS = ["dbrouters.ClickHouseRouter"]
9495
```
9596

9697
```python
9798
# dbrouters.py
9899
from clickhouse_backend.models import ClickhouseModel
99100

101+
100102
def get_subclasses(class_):
101103
classes = class_.__subclasses__()
102104

@@ -118,21 +120,21 @@ class ClickHouseRouter:
118120

119121
def db_for_read(self, model, **hints):
120122
if (model._meta.label_lower in self.route_model_names
121-
or hints.get('clickhouse')):
122-
return 'clickhouse'
123+
or hints.get("clickhouse")):
124+
return "clickhouse"
123125
return None
124126

125127
def db_for_write(self, model, **hints):
126128
if (model._meta.label_lower in self.route_model_names
127-
or hints.get('clickhouse')):
128-
return 'clickhouse'
129+
or hints.get("clickhouse")):
130+
return "clickhouse"
129131
return None
130132

131133
def allow_migrate(self, db, app_label, model_name=None, **hints):
132-
if (f'{app_label}.{model_name}' in self.route_model_names
133-
or hints.get('clickhouse')):
134-
return db == 'clickhouse'
135-
elif db == 'clickhouse':
134+
if (f"{app_label}.{model_name}" in self.route_model_names
135+
or hints.get("clickhouse")):
136+
return db == "clickhouse"
137+
elif db == "clickhouse":
136138
return False
137139
return None
138140
```
@@ -159,7 +161,7 @@ Notices about model definition:
159161
- need to specify the engine for clickhouse, specify the order_by for clickhouse order and the partition_by argument
160162

161163
```python
162-
from django.db.models import CheckConstraint, Func, Q, IntegerChoices
164+
from django.db.models import CheckConstraint, Func, IntegerChoices, Q
163165
from django.utils import timezone
164166

165167
from clickhouse_backend import models
@@ -170,31 +172,31 @@ class Event(models.ClickhouseModel):
170172
PASS = 1
171173
DROP = 2
172174
ALERT = 3
173-
ip = models.GenericIPAddressField(default='::')
174-
ipv4 = models.GenericIPAddressField(default='127.0.0.1')
175+
ip = models.GenericIPAddressField(default="::")
176+
ipv4 = models.GenericIPAddressField(default="127.0.0.1")
175177
ip_nullable = models.GenericIPAddressField(null=True)
176178
port = models.UInt16Field(default=0)
177-
protocol = models.StringField(default='', low_cardinality=True)
178-
content = models.StringField(default='')
179+
protocol = models.StringField(default="", low_cardinality=True)
180+
content = models.StringField(default="")
179181
timestamp = models.DateTime64Field(default=timezone.now)
180182
created_at = models.DateTime64Field(auto_now_add=True)
181183
action = models.EnumField(choices=Action.choices, default=Action.PASS)
182184

183185
class Meta:
184-
verbose_name = 'Network event'
185-
ordering = ['-id']
186-
db_table = 'event'
186+
verbose_name = "Network event"
187+
ordering = ["-id"]
188+
db_table = "event"
187189
engine = models.ReplacingMergeTree(
188-
order_by=['id'],
189-
partition_by=Func('timestamp', function='toYYYYMMDD'),
190+
order_by=["id"],
191+
partition_by=Func("timestamp", function="toYYYYMMDD"),
190192
index_granularity=1024,
191193
index_granularity_bytes=1 << 20,
192194
enable_mixed_granularity_parts=1,
193195
)
194196
indexes = [
195197
models.Index(
196198
fields=["ip"],
197-
name='ip_set_idx',
199+
name="ip_set_idx",
198200
type=models.Set(1000),
199201
granularity=4
200202
),
@@ -207,7 +209,7 @@ class Event(models.ClickhouseModel):
207209
]
208210
constraints = (
209211
CheckConstraint(
210-
name='port_range',
212+
name="port_range",
211213
check=Q(port__gte=0, port__lte=65535),
212214
),
213215
)
@@ -289,7 +291,7 @@ create
289291
```python
290292
for i in range(10):
291293
Event.objects.create(ip_nullable=None, port=i,
292-
protocol="HTTP", content="test",
294+
protocol="HTTP", content="test",
293295
action=Event.Action.PASS.value)
294296
assert Event.objects.count() == 10
295297
```
@@ -332,31 +334,205 @@ There are 2 ways to do that:
332334
- Config database engine as follows, this sets [`mutations_sync=1`](https://clickhouse.com/docs/en/operations/settings/settings#mutations_sync) at session scope.
333335
```python
334336
DATABASES = {
335-
'default': {
336-
'ENGINE': 'clickhouse_backend.backend',
337-
'OPTIONS': {
338-
'settings': {
339-
'mutations_sync': 1,
337+
"default": {
338+
"ENGINE": "clickhouse_backend.backend",
339+
"OPTIONS": {
340+
"settings": {
341+
"mutations_sync": 1,
340342
}
341343
}
342344
}
343345
}
344346
```
345347
- Use [SETTINGS in SELECT Query](https://clickhouse.com/docs/en/sql-reference/statements/select/#settings-in-select-query).
346348
```python
347-
Event.objects.filter(protocol='UDP').settings(mutations_sync=1).delete()
349+
Event.objects.filter(protocol="UDP").settings(mutations_sync=1).delete()
348350
```
349351

350352
Sample test case.
351353

352354
```python
353355
from django.test import TestCase
354356

357+
355358
class TestEvent(TestCase):
356359
def test_spam(self):
357360
assert Event.objects.count() == 0
358361
```
359362

363+
Distributed table
364+
---
365+
366+
This backend support [distributed DDL queries (ON CLUSTER clause)](https://clickhouse.com/docs/en/sql-reference/distributed-ddl)
367+
and [distributed table engine](https://clickhouse.com/docs/en/engines/table-engines/special/distributed).
368+
369+
The following example assumes that a cluster defined by [docker compose in this repository](https://github.com/jayvynl/django-clickhouse-backend/blob/main/compose.yaml) is used.
370+
This cluster name is `cluster`, it has 2 shards, every shard has 2 replica.
371+
372+
### Configuration
373+
374+
```python
375+
DATABASES = {
376+
"default": {
377+
"ENGINE": "clickhouse_backend.backend",
378+
"OPTIONS": {
379+
"migration_cluster": "cluster",
380+
"settings": {
381+
"mutations_sync": 1,
382+
"insert_distributed_sync": 1,
383+
},
384+
},
385+
"TEST": {"cluster": "cluster"},
386+
},
387+
"s1r2": {
388+
"ENGINE": "clickhouse_backend.backend",
389+
"PORT": 9001,
390+
"OPTIONS": {
391+
"migration_cluster": "cluster",
392+
"settings": {
393+
"mutations_sync": 1,
394+
"insert_distributed_sync": 1,
395+
},
396+
},
397+
"TEST": {"cluster": "cluster", "managed": False},
398+
},
399+
"s2r1": {
400+
"ENGINE": "clickhouse_backend.backend",
401+
"PORT": 9002,
402+
"OPTIONS": {
403+
"migration_cluster": "cluster",
404+
"settings": {
405+
"mutations_sync": 1,
406+
"insert_distributed_sync": 1,
407+
},
408+
},
409+
"TEST": {"cluster": "cluster", "managed": False},
410+
},
411+
"s2r2": {
412+
"ENGINE": "clickhouse_backend.backend",
413+
"PORT": 9003,
414+
"OPTIONS": {
415+
"migration_cluster": "cluster",
416+
"settings": {
417+
"mutations_sync": 1,
418+
"insert_distributed_sync": 1,
419+
},
420+
},
421+
"TEST": {"cluster": "cluster", "managed": False},
422+
},
423+
}
424+
```
425+
426+
Extra settings explanation:
427+
428+
- `"migration_cluster": "cluster"`
429+
Migration table will be created on this cluster if this setting is specified, otherwise only local migration table is created.
430+
- `"mutations_sync": 1`
431+
This is suggested if you want to test [data mutations](https://clickhouse.com/docs/en/guides/developer/mutations).
432+
- `"insert_distributed_sync": 1`
433+
This is suggested if you want to test inserting data into distributed table.
434+
- `"TEST": {"cluster": "cluster", "managed": False}`
435+
Test database will be created on this cluster.
436+
If you have multiple database connections to the same cluster and want to run tests over all these connections,
437+
then only one connection should set `"managed": True`(the default value), other connections should set `"managed": False`.
438+
So that test database will not be created multiple times.
439+
440+
Do not hardcode database name when you define replicated table or distributed table.
441+
Because test database name is different from deployed database name.
442+
443+
### Model
444+
445+
`cluster` in `Meta` class will make models being created on cluster.
446+
447+
```python
448+
from clickhouse_backend import models
449+
450+
451+
class Student(models.ClickhouseModel):
452+
name = models.StringField()
453+
address = models.StringField()
454+
score = models.Int8Field()
455+
456+
class Meta:
457+
engine = models.ReplicatedMergeTree(
458+
"/clickhouse/tables/{uuid}/{shard}",
459+
# Or if you want to use database name or table name, you should also use macro instead of hardcoded name.
460+
# "/clickhouse/tables/{database}/{table}/{shard}",
461+
"{replica}",
462+
order_by="id"
463+
)
464+
cluster = "cluster"
465+
466+
467+
class DistributedStudent(models.ClickhouseModel):
468+
name = models.StringField()
469+
score = models.Int8Field()
470+
471+
class Meta:
472+
engine = models.Distributed(
473+
"cluster", models.currentDatabase(), Student._meta.db_table, models.Rand()
474+
)
475+
cluster = "cluster"
476+
```
477+
478+
### CRUD
479+
480+
Just like normal table, you can do whatever you like to distributed table.
481+
482+
```python
483+
students = DistributedStudent.objects.bulk_create([DistributedStudent(name=f"Student{i}", score=i * 10) for i in range(10)])
484+
assert DistributedStudent.objects.count() == 10
485+
DistributedStudent.objects.filter(id__in=[s.id for s in students[5:]]).update(name="lol")
486+
DistributedStudent.objects.filter(id__in=[s.id for s in students[:5]]).delete()
487+
```
488+
489+
### Migrate
490+
491+
If `migration_cluster` is not specified in database configuration. You should always run migrating on one specific cluster node.
492+
Because other nodes do not know whether migrations have been applied by any other node.
493+
494+
If `migration_cluster` is specified. Then migration table(named `django_migrations`) will be created on the specified cluster.
495+
When applied, [migration options](https://docs.djangoproject.com/en/4.2/ref/migration-operations/) of model with cluster defined in `Meta` class
496+
will be executed on cluster, other migration options will be executed locally.
497+
This means distributed table will be created on all nodes as long as any node has applied the migrations.
498+
Other local table will only be created on node which has applied the migrations.
499+
500+
If you want to use local table in all nodes, you should apply migrations multiple times on all nodes.
501+
But remember, these local tables store data separately, currently this backend do not provide means to query data from other nodes.
502+
503+
```shell
504+
python manage.py migrate
505+
python manage.py migrate --database s1r2
506+
python manage.py migrate --database s2r1
507+
python manage.py migrate --database s2r2
508+
```
509+
510+
### Update
511+
512+
When updated from django clickhouse backend 1.1.0 or lower, you should not add cluster related settings to your
513+
existing project. Because:
514+
515+
- Migration table schema won't be changed if you add or remove `migration_on_cluster`. And `python mange.py migrate` will work abnormally.
516+
- If you add cluster to your existing model's `Meta` class, no schema changes will occur, this project does not support this yet.
517+
518+
If you really want to use cluster feature with existing project, you should manage schema changes yourself.
519+
These steps should be tested carefully in test environment.
520+
[Clickhouse docs](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication#converting-from-mergetree-to-replicatedmergetree) may be helpful.
521+
522+
1. Apply all your existing migrations.
523+
2. Change your settings and model.
524+
3. Generate new migrations.
525+
4. Log into your clickhouse database and change table schemas to reflect your models.
526+
5. Apply migrations with fake flag.
527+
528+
```shell
529+
python manage.py migrate
530+
# Change your settings and model
531+
python manage.py makemigrations
532+
# Log into your clickhouse database and change table schemas to reflect your models.
533+
python manage.py migrate --fake
534+
```
535+
360536

361537
Test
362538
---

0 commit comments

Comments
 (0)