You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/en/migrations/postgres/designing-schemas.md
+51-57Lines changed: 51 additions & 57 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ The Stack Overflow dataset contains a number of related tables. We recommend mig
13
13
14
14
Adhering to this principle, we focus on the main `posts` table. The Postgres schema for this is shown below:
15
15
16
-
```sql
16
+
```sql title="Query"
17
17
CREATETABLEposts (
18
18
Id int,
19
19
PostTypeId int,
@@ -44,33 +44,35 @@ CREATE TABLE posts (
44
44
45
45
To establish the equivalent types for each of the above columns, we can use the `DESCRIBE` command with the [Postgres table function](/en/sql-reference/table-functions/postgresql). Modify the following command to your Postgres instance:
@@ -82,7 +84,7 @@ This provides us with an initial non-optimized schema.
82
84
83
85
We can create a ClickHouse table using these types with a simple `CREATE AS EMPTY SELECT` command.
84
86
85
-
```sql
87
+
```sql title="Query"
86
88
CREATETABLEposts
87
89
ENGINE = MergeTree
88
90
ORDER BY () EMPTY AS
@@ -95,10 +97,9 @@ This same approach can be used to load data from s3 in other formats. See here f
95
97
96
98
With our table created, we can insert the rows from Postgres into ClickHouse using the [Postgres table function](/en/sql-reference/table-functions/postgresql).
97
99
98
-
```sql
100
+
```sql title="Query"
99
101
INSERT INTO posts SELECT*
100
102
FROM postgresql('<host>:<port>', 'postgres', 'posts', '<username>', '<password>')
101
-
102
103
0 rows inset. Elapsed: 1136.841 sec. Processed 58.89 million rows, 80.85 GB (51.80 thousand rows/s., 71.12 MB/s.)
If using the full dataset, the example should load 59m posts. Confirm with a simple count in ClickHouse:
111
112
112
-
```sql
113
+
```sql title="Query"
113
114
SELECTcount()
114
115
FROM posts
116
+
```
115
117
118
+
```response title="Response"
116
119
┌──count()─┐
117
120
│ 58889566 │
118
121
└──────────┘
@@ -122,7 +125,7 @@ FROM posts
122
125
123
126
The steps for optimizing the types for this schema are identical to if the data has been loaded from other sources e.g. Parquet on S3. Applying the process described in this [alternate guide using Parquet](/en/data-modeling/schema-design) results in the following schema:
124
127
125
-
```sql
128
+
```sql title="Query"
126
129
CREATETABLEposts_v2
127
130
(
128
131
`Id` Int32,
@@ -155,9 +158,8 @@ COMMENT 'Optimized types'
155
158
156
159
We can populate this with a simple `INSERT INTO SELECT`, reading the data from our previous table and inserting into this one:
157
160
158
-
```sql
161
+
```sql title="Query"
159
162
INSERT INTO posts_v2 SELECT*FROM posts
160
-
161
163
0 rows inset. Elapsed: 146.471 sec. Processed 59.82 million rows, 83.82 GB (408.40 thousand rows/s., 572.25 MB/s.)
162
164
```
163
165
@@ -203,44 +205,36 @@ For the considerations and steps in choosing an ordering key, using the posts ta
203
205
204
206
ClickHouse's column-oriented storage means compression will often be significantly better when compared to Postgres. The following illustrated when comparing the storage requirement for all Stack Overflow tables in both databases:
205
207
206
-
```sql
207
-
--Postgres
208
+
```sql title="Query (Postgres)"
208
209
SELECT
209
-
schemaname,
210
-
tablename,
211
-
pg_total_relation_size(schemaname ||'.'|| tablename) AS total_size_bytes,
212
-
pg_total_relation_size(schemaname ||'.'|| tablename) / (1024*1024*1024) AS total_size_gb
210
+
schemaname,
211
+
tablename,
212
+
pg_total_relation_size(schemaname ||'.'|| tablename) AS total_size_bytes,
213
+
pg_total_relation_size(schemaname ||'.'|| tablename) / (1024*1024*1024) AS total_size_gb
0 commit comments