Skip to content

Commit cb3c66e

Browse files
authored
Merge pull request #3265 from ClickHouse/Blargian-patch-3
Docs: Fix code blocks designing-schemas.md
2 parents 99af99e + 4a07a22 commit cb3c66e

File tree

1 file changed

+51
-57
lines changed

1 file changed

+51
-57
lines changed

docs/en/migrations/postgres/designing-schemas.md

Lines changed: 51 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ The Stack Overflow dataset contains a number of related tables. We recommend mig
1313

1414
Adhering to this principle, we focus on the main `posts` table. The Postgres schema for this is shown below:
1515

16-
```sql
16+
```sql title="Query"
1717
CREATE TABLE posts (
1818
Id int,
1919
PostTypeId int,
@@ -44,33 +44,35 @@ CREATE TABLE posts (
4444

4545
To establish the equivalent types for each of the above columns, we can use the `DESCRIBE` command with the [Postgres table function](/en/sql-reference/table-functions/postgresql). Modify the following command to your Postgres instance:
4646

47-
```sql
47+
```sql title="Query"
4848
DESCRIBE TABLE postgresql('<host>:<port>', 'postgres', 'posts', '<username>', '<password>')
4949
SETTINGS describe_compact_output = 1
50+
```
5051

52+
```response title="Response"
5153
┌─name──────────────────┬─type────────────────────┐
52-
│ id │ Int32
53-
│ posttypeid │ Nullable(Int32)
54-
│ acceptedanswerid │ Nullable(String)
55-
│ creationdate │ Nullable(DateTime64(6)) │
56-
│ score │ Nullable(Int32)
57-
│ viewcount │ Nullable(Int32)
58-
│ body │ Nullable(String)
59-
│ owneruserid │ Nullable(Int32)
60-
│ ownerdisplayname │ Nullable(String)
61-
│ lasteditoruserid │ Nullable(String)
62-
│ lasteditordisplayname │ Nullable(String)
63-
│ lasteditdate │ Nullable(DateTime64(6)) │
64-
│ lastactivitydate │ Nullable(DateTime64(6)) │
65-
│ title │ Nullable(String)
66-
│ tags │ Nullable(String)
67-
│ answercount │ Nullable(Int32)
68-
│ commentcount │ Nullable(Int32)
69-
│ favoritecount │ Nullable(Int32)
70-
│ contentlicense │ Nullable(String)
71-
│ parentid │ Nullable(String)
54+
│ id │ Int32
55+
│ posttypeid │ Nullable(Int32) │
56+
│ acceptedanswerid │ Nullable(String) │
57+
│ creationdate │ Nullable(DateTime64(6)) │
58+
│ score │ Nullable(Int32) │
59+
│ viewcount │ Nullable(Int32) │
60+
│ body │ Nullable(String) │
61+
│ owneruserid │ Nullable(Int32) │
62+
│ ownerdisplayname │ Nullable(String) │
63+
│ lasteditoruserid │ Nullable(String) │
64+
│ lasteditordisplayname │ Nullable(String) │
65+
│ lasteditdate │ Nullable(DateTime64(6)) │
66+
│ lastactivitydate │ Nullable(DateTime64(6)) │
67+
│ title │ Nullable(String) │
68+
│ tags │ Nullable(String) │
69+
│ answercount │ Nullable(Int32) │
70+
│ commentcount │ Nullable(Int32) │
71+
│ favoritecount │ Nullable(Int32) │
72+
│ contentlicense │ Nullable(String) │
73+
│ parentid │ Nullable(String) │
7274
│ communityowneddate │ Nullable(DateTime64(6)) │
73-
│ closeddate │ Nullable(DateTime64(6)) │
75+
│ closeddate │ Nullable(DateTime64(6)) │
7476
└───────────────────────┴─────────────────────────┘
7577
7678
22 rows in set. Elapsed: 0.478 sec.
@@ -82,7 +84,7 @@ This provides us with an initial non-optimized schema.
8284
8385
We can create a ClickHouse table using these types with a simple `CREATE AS EMPTY SELECT` command.
8486

85-
```sql
87+
```sql title="Query"
8688
CREATE TABLE posts
8789
ENGINE = MergeTree
8890
ORDER BY () EMPTY AS
@@ -95,10 +97,9 @@ This same approach can be used to load data from s3 in other formats. See here f
9597

9698
With our table created, we can insert the rows from Postgres into ClickHouse using the [Postgres table function](/en/sql-reference/table-functions/postgresql).
9799

98-
```sql
100+
```sql title="Query"
99101
INSERT INTO posts SELECT *
100102
FROM postgresql('<host>:<port>', 'postgres', 'posts', '<username>', '<password>')
101-
102103
0 rows in set. Elapsed: 1136.841 sec. Processed 58.89 million rows, 80.85 GB (51.80 thousand rows/s., 71.12 MB/s.)
103104
Peak memory usage: 2.51 GiB.
104105
```
@@ -109,10 +110,12 @@ Peak memory usage: 2.51 GiB.
109110
110111
If using the full dataset, the example should load 59m posts. Confirm with a simple count in ClickHouse:
111112

112-
```sql
113+
```sql title="Query"
113114
SELECT count()
114115
FROM posts
116+
```
115117

118+
```response title="Response"
116119
┌──count()─┐
117120
│ 58889566 │
118121
└──────────┘
@@ -122,7 +125,7 @@ FROM posts
122125

123126
The steps for optimizing the types for this schema are identical to if the data has been loaded from other sources e.g. Parquet on S3. Applying the process described in this [alternate guide using Parquet](/en/data-modeling/schema-design) results in the following schema:
124127

125-
```sql
128+
```sql title="Query"
126129
CREATE TABLE posts_v2
127130
(
128131
`Id` Int32,
@@ -155,9 +158,8 @@ COMMENT 'Optimized types'
155158

156159
We can populate this with a simple `INSERT INTO SELECT`, reading the data from our previous table and inserting into this one:
157160

158-
```sql
161+
```sql title="Query"
159162
INSERT INTO posts_v2 SELECT * FROM posts
160-
161163
0 rows in set. Elapsed: 146.471 sec. Processed 59.82 million rows, 83.82 GB (408.40 thousand rows/s., 572.25 MB/s.)
162164
```
163165

@@ -203,44 +205,36 @@ For the considerations and steps in choosing an ordering key, using the posts ta
203205

204206
ClickHouse's column-oriented storage means compression will often be significantly better when compared to Postgres. The following illustrated when comparing the storage requirement for all Stack Overflow tables in both databases:
205207

206-
```sql
207-
--Postgres
208+
```sql title="Query (Postgres)"
208209
SELECT
209-
schemaname,
210-
tablename,
211-
pg_total_relation_size(schemaname || '.' || tablename) AS total_size_bytes,
212-
pg_total_relation_size(schemaname || '.' || tablename) / (1024 * 1024 * 1024) AS total_size_gb
210+
schemaname,
211+
tablename,
212+
pg_total_relation_size(schemaname || '.' || tablename) AS total_size_bytes,
213+
pg_total_relation_size(schemaname || '.' || tablename) / (1024 * 1024 * 1024) AS total_size_gb
213214
FROM
214-
pg_tables s
215+
pg_tables s
215216
WHERE
216-
schemaname = 'public';
217-
schemaname | tablename | total_size_bytes | total_size_gb |
218-
------------+-----------------+------------------+---------------+
219-
public | users | 4288405504 | 3 |
220-
public | posts | 68606214144 | 63 |
221-
public | votes | 20525654016 | 19 |
222-
public | comments | 22888538112 | 21 |
223-
public | posthistory | 125899735040 | 117 |
224-
public | postlinks | 579387392 | 0 |
225-
public | badges | 4989747200 | 4 |
226-
(7 rows)
227-
228-
--ClickHouse
217+
schemaname = 'public';
218+
```
219+
220+
```sql title="Query (ClickHouse)"
229221
SELECT
230222
`table`,
231223
formatReadableSize(sum(data_compressed_bytes)) AS compressed_size
232224
FROM system.parts
233225
WHERE (database = 'stackoverflow') AND active
234226
GROUP BY `table`
227+
```
235228

229+
```response title="Response"
236230
┌─table───────┬─compressed_size─┐
237-
│ posts 25.17 GiB
238-
│ users 846.57 MiB
239-
│ badges 513.13 MiB
240-
│ comments 7.11 GiB
241-
│ votes 1.28 GiB
242-
│ posthistory │ 40.44 GiB
243-
│ postlinks │ 79.22 MiB
231+
│ posts │ 25.17 GiB │
232+
│ users │ 846.57 MiB │
233+
│ badges │ 513.13 MiB │
234+
│ comments │ 7.11 GiB │
235+
│ votes │ 1.28 GiB │
236+
│ posthistory │ 40.44 GiB │
237+
│ postlinks │ 79.22 MiB │
244238
└─────────────┴─────────────────┘
245239
```
246240

0 commit comments

Comments
 (0)