Skip to content

Commit 95f9186

Browse files
committed
update README
1 parent a2a840d commit 95f9186

File tree

4 files changed

+238
-2
lines changed

4 files changed

+238
-2
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@ venv
33
**/__pycache__
44
django_clickhouse_backend.egg-info
55
build
6+
dist
67
tests/unsupported/

README.md

Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,238 @@ Django ClickHouse Database Backend
22
===
33

44
[中文文档](README_cn.md)
5+
6+
Django clickhouse backend is a [django database backend](https://docs.djangoproject.com/en/4.1/ref/databases/) for
7+
[clickhouse](https://clickhouse.com/docs/en/home/) database. This project allows using django ORM to interact with
8+
clickhouse.
9+
10+
Thanks to [clickhouse driver](https://github.com/mymarilyn/clickhouse-driver), django clickhouse backend use it as [DBAPI](https://peps.python.org/pep-0249/).
11+
Thanks to [clickhouse pool](https://github.com/ericmccarthy7/clickhouse-pool), it makes clickhouse connection pool.
12+
13+
14+
**features:**
15+
16+
- Support [Clickhouse native interface](https://clickhouse.com/docs/en/interfaces/tcp/) and connection pool.
17+
- Define clickhouse specific schema features such as [Engine](https://clickhouse.com/docs/en/engines/table-engines/) and [Index](https://clickhouse.com/docs/en/guides/improving-query-performance/skipping-indexes) in django ORM.
18+
- Support table migrations.
19+
- Support creating test database and table, working with django TestCase and pytest-django.
20+
- Support most types of query and data types, full feature is under developing.
21+
- Support [SETTINGS in SELECT Query](https://clickhouse.com/docs/en/sql-reference/statements/select/#settings-in-select-query).
22+
23+
Get started
24+
---
25+
26+
### Installation
27+
28+
```shell
29+
pip install git+https://github.com/jayvynl/django-clickhouse-backend
30+
```
31+
32+
or
33+
34+
```shell
35+
git clone https://github.com/jayvynl/django-clickhouse-backend
36+
cd django-clickhouse-backend
37+
python setup.py install
38+
```
39+
40+
### Configuration
41+
42+
Only `ENGINE` is required, other options have default values.
43+
44+
- ENGINE: required, set to `clickhouse_backend.backend`.
45+
- NAME: database name, default `default`.
46+
- HOST: database host, default `localhost`.
47+
- PORT: database port, default `9000`.
48+
- USER: database user, default `default`.
49+
- PASSWORD: database password, default empty.
50+
51+
```python
52+
DATABASES = {
53+
'default': {
54+
'ENGINE': 'clickhouse_backend.backend',
55+
'NAME': 'default',
56+
'HOST': 'localhost',
57+
'USER': 'DB_USER',
58+
'PASSWORD': 'DB_PASSWORD',
59+
'TEST': {
60+
'fake_transaction': True
61+
}
62+
}
63+
}
64+
DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField'
65+
```
66+
67+
`DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField'` IS REQUIRED TO WORKING WITH DJANGO MIGRATION.
68+
More details will be covered in [Primary key](#Primary key).
69+
70+
### Model
71+
72+
```python
73+
from django.db import models
74+
from django.utils import timezone
75+
76+
from clickhouse_backend import models as chm
77+
from clickhouse_backend.models import indexes, engines
78+
79+
80+
class Event(chm.ClickhouseModel):
81+
src_ip = chm.GenericIPAddressField(default='::')
82+
sport = chm.PositiveSmallIntegerField(default=0)
83+
dst_ip = chm.GenericIPAddressField(default='::')
84+
dport = chm.PositiveSmallIntegerField(default=0)
85+
transport = models.CharField(max_length=3, default='')
86+
protocol = models.TextField(default='')
87+
content = models.TextField(default='')
88+
timestamp = models.DateTimeField(default=timezone.now)
89+
created_at = models.DateTimeField(auto_now_add=True)
90+
length = chm.PositiveIntegerField(default=0)
91+
count = chm.PositiveIntegerField(default=1)
92+
93+
class Meta:
94+
verbose_name = 'Network event'
95+
ordering = ['-id']
96+
db_table = 'event'
97+
engine = engines.ReplacingMergeTree(
98+
order_by=('dst_ip', 'timestamp'),
99+
partition_by=models.Func('timestamp', function='toYYYYMMDD')
100+
)
101+
indexes = [
102+
indexes.Index(
103+
fields=('src_ip', 'dst_ip'),
104+
type=indexes.Set(1000),
105+
granularity=4
106+
)
107+
]
108+
constraints = (
109+
models.CheckConstraint(
110+
name='sport_range',
111+
check=models.Q(sport__gte=0, dport__lte=65535),
112+
),
113+
)
114+
```
115+
116+
### Migration
117+
118+
```shell
119+
python manage.py makemigrations
120+
```
121+
122+
### Testing
123+
124+
Writing testcase is all the same as normal django project. You can use django TestCase or pytest-django.
125+
**Notice:** clickhouse use mutations for [deleting or updating](https://clickhouse.com/docs/en/guides/developer/mutations).
126+
By default, data mutations is processed asynchronously, so you should change this default behavior in testing for deleting or updating.
127+
There are 2 ways to do that:
128+
129+
- Config database engine as follows, this sets [`mutations_sync=1`](https://clickhouse.com/docs/en/operations/settings/settings#mutations_sync) at session scope.
130+
```python
131+
DATABASES = {
132+
'default': {
133+
'ENGINE': 'clickhouse_backend.backend',
134+
'OPTIONS': {
135+
'settings': {
136+
'mutations_sync': 1,
137+
}
138+
}
139+
}
140+
}
141+
```
142+
- Use [SETTINGS in SELECT Query](https://clickhouse.com/docs/en/sql-reference/statements/select/#settings-in-select-query).
143+
```python
144+
Event.objects.filter(transport='UDP').settings(mutations_sync=1).delete()
145+
```
146+
147+
Sample test case.
148+
149+
```python
150+
from django.test import TestCase
151+
152+
class TestEvent(TestCase):
153+
def test_spam(self):
154+
assert Event.objects.count() == 0
155+
```
156+
157+
Topics
158+
---
159+
160+
### Primary key
161+
162+
Django ORM depends heavily on single column primary key, this primary key is a unique identifier of an ORM object.
163+
All `get` `save` `delete` actions depend on primary key.
164+
165+
But in ClickHouse [primary key](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#primary-keys-and-indexes-in-queries) has different meaning with django primary key. ClickHouse does not require a unique primary key. You can insert multiple rows with the same primary key.
166+
167+
There is [no unique constraint](https://github.com/ClickHouse/ClickHouse/issues/3386#issuecomment-429874647) or auto increasing column in clickhouse.
168+
169+
By default, django will add a field named `id` as auto increasing primary key.
170+
171+
- AutoField
172+
173+
Mapped to clickhouse Int32 data type. You should generate this unique id yourself
174+
175+
- BigAutoField
176+
177+
Mapped to clickhouse Int64 data type. If primary key is not specified when insert data, then `clickhouse_driver.idworker.id_worker` is used to generate this unique key.
178+
179+
Default id_worker is an instance of `clickhouse.idworker.snowflake.SnowflakeIDWorker` which implement [twitter snowflake id](https://en.wikipedia.org/wiki/Snowflake_ID).
180+
If data insertions happen on multiple datacenter, server, process or thread, you should ensure uniqueness of (CLICKHOUSE_WORKER_ID, CLICKHOUSE_DATACENTER_ID) environment variable.
181+
Because work_id and datacenter_id are 5 bits, they should be an integer between 0 and 31. CLICKHOUSE_WORKER_ID default to 0, CLICKHOUSE_DATACENTER_ID will be generated randomly if not provided.
182+
183+
`clickhouse.idworker.snowflake.SnowflakeIDWorker` is not thread safe. You could inherit `clickhouse.idworker.base.BaseIDWorker` and implement one, and set `CLICKHOUSE_ID_WORKER` to doted import path of your IDWorker instance.
184+
185+
Django use a table named `django_migrations` to track migration files. ID field should be BigAutoField, so that IDWorker can generate unique id for you.
186+
After Django 3.2,a new [config `DEFAULT_AUTO_FIELD`](https://docs.djangoproject.com/en/4.1/releases/3.2/#customizing-type-of-auto-created-primary-keys) is introduced to control field type of default primary key.
187+
So `DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField'` is required if you want to use migrations with django clickhouse backend.
188+
189+
190+
### Fields
191+
192+
#### Nullable
193+
194+
`null=True` will make [Nullable](https://clickhouse.com/docs/en/sql-reference/data-types/nullable/) type in clickhouse database.
195+
196+
**Note** Using Nullable almost always negatively affects performance, keep this in mind when designing your databases.
197+
198+
#### GenericIPAddressField
199+
200+
Clickhouse backend has its own implementation in `clickhouse_backend.models.fields.GenericIPAddressField`.
201+
If `protocol='ipv4'`, a column of [IPv4](https://clickhouse.com/docs/en/sql-reference/data-types/domains/ipv4) is generated, else [IPv6](https://clickhouse.com/docs/en/sql-reference/data-types/domains/ipv6) is generated.
202+
203+
#### PositiveSmallIntegerField
204+
#### PositiveIntegerField
205+
#### PositiveBigIntegerField
206+
207+
`clickhouse_backend.models.fields.PositiveSmallIntegerField` maps to [UInt16](https://clickhouse.com/docs/en/sql-reference/data-types/int-uint).
208+
`clickhouse_backend.models.fields.PositiveIntegerField` maps to [UInt32](https://clickhouse.com/docs/en/sql-reference/data-types/int-uint).
209+
`clickhouse_backend.models.fields.PositiveBigIntegerField` maps to [UInt64](https://clickhouse.com/docs/en/sql-reference/data-types/int-uint).
210+
Clickhouse have unsigned integer type, these fields will have right integer range validators.
211+
212+
213+
### Engines
214+
215+
Lays in `clickhouse_backend.models.engines`.
216+
217+
### Indexes
218+
219+
Lays in `clickouse_backend.models.indexes`.
220+
221+
Test
222+
---
223+
224+
To run test for this project:
225+
226+
```shell
227+
git clone https://github.com/jayvynl/django-clickhouse-backend
228+
cd django-clickhouse-backend
229+
# docker and docker-compose are required.
230+
docker-compose up -d
231+
python tests/runtests.py
232+
```
233+
234+
**Note** This project is not fully tested yet and should be used with caution in production.
235+
236+
License
237+
---
238+
239+
Django clickhouse backend is distributed under the [MIT license](http://www.opensource.org/licenses/mit-license.php).

README_cn.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ pip install git+https://github.com/jayvynl/django-clickhouse-backend
5858
'USER': 'DB_USER',
5959
'PASSWORD': 'DB_PASSWORD'
6060
},
61-
'clickhouse_backend': {
61+
'clickhouse': {
6262
'ENGINE': 'clickhouse_backend.backend',
6363
'NAME': 'default',
6464
'HOST': 'localhost',

clickhouse_backend/VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.1.0
1+
0.2.0

0 commit comments

Comments
 (0)