Skip to content

Commit 43bc592

Browse files
authored
Merge pull request #3348 from pkutaj/fix_table_formatting_in_compression_doc
docs: fix table formatting
2 parents 904ed8c + ed0a955 commit 43bc592

File tree

1 file changed

+33
-33
lines changed

1 file changed

+33
-33
lines changed

docs/data-compression/compression-in-clickhouse.md

Lines changed: 33 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -36,30 +36,30 @@ FROM system.columns
3636
WHERE table = 'posts'
3737
GROUP BY name
3838

39-
┌─name──────────────────┬─compressed_size─┬─uncompressed_size─┬───ratio─┐
40-
│ Body │ 46.14 GiB │ 127.31 GiB 2.76
41-
│ Title │ 1.20 GiB │ 2.63 GiB 2.19
42-
│ Score │ 84.77 MiB │ 736.45 MiB 8.69
43-
│ Tags │ 475.56 MiB │ 1.40 GiB 3.02
44-
│ ParentId │ 210.91 MiB │ 696.20 MiB 3.3
45-
│ Id │ 111.17 MiB │ 736.45 MiB 6.62
46-
│ AcceptedAnswerId │ 81.55 MiB │ 736.45 MiB 9.03
47-
│ ClosedDate │ 13.99 MiB │ 517.82 MiB 37.02
48-
│ LastActivityDate │ 489.84 MiB │ 964.64 MiB 1.97
49-
│ CommentCount │ 37.62 MiB │ 565.30 MiB 15.03
50-
│ OwnerUserId │ 368.98 MiB │ 736.45 MiB 2
51-
│ AnswerCount │ 21.82 MiB │ 622.35 MiB 28.53
52-
│ FavoriteCount │ 280.95 KiB │ 508.40 MiB 1853.02
53-
│ ViewCount │ 95.77 MiB │ 736.45 MiB 7.69
54-
│ LastEditorUserId │ 179.47 MiB │ 736.45 MiB 4.1
55-
│ ContentLicense │ 5.45 MiB │ 847.92 MiB 155.5
56-
│ OwnerDisplayName │ 14.30 MiB │ 142.58 MiB 9.97
57-
│ PostTypeId │ 20.93 MiB │ 565.30 MiB 27
58-
│ CreationDate │ 314.17 MiB │ 964.64 MiB 3.07
59-
│ LastEditDate │ 346.32 MiB │ 964.64 MiB 2.79
60-
│ LastEditorDisplayName │ 5.46 MiB │ 124.25 MiB 22.75
61-
│ CommunityOwnedDate │ 2.21 MiB │ 509.60 MiB 230.94
62-
└───────────────────────┴─────────────────┴───────────────────┴─────────┘
39+
┌─name──────────────────┬─compressed_size─┬─uncompressed_size─┬───ratio────
40+
│ Body │ 46.14 GiB │ 127.31 GiB 2.76
41+
│ Title │ 1.20 GiB │ 2.63 GiB 2.19
42+
│ Score │ 84.77 MiB │ 736.45 MiB 8.69
43+
│ Tags │ 475.56 MiB │ 1.40 GiB 3.02
44+
│ ParentId │ 210.91 MiB │ 696.20 MiB 3.3
45+
│ Id │ 111.17 MiB │ 736.45 MiB 6.62
46+
│ AcceptedAnswerId │ 81.55 MiB │ 736.45 MiB 9.03
47+
│ ClosedDate │ 13.99 MiB │ 517.82 MiB 37.02
48+
│ LastActivityDate │ 489.84 MiB │ 964.64 MiB 1.97
49+
│ CommentCount │ 37.62 MiB │ 565.30 MiB 15.03
50+
│ OwnerUserId │ 368.98 MiB │ 736.45 MiB 2
51+
│ AnswerCount │ 21.82 MiB │ 622.35 MiB 28.53
52+
│ FavoriteCount │ 280.95 KiB │ 508.40 MiB 1853.02
53+
│ ViewCount │ 95.77 MiB │ 736.45 MiB 7.69
54+
│ LastEditorUserId │ 179.47 MiB │ 736.45 MiB 4.1
55+
│ ContentLicense │ 5.45 MiB │ 847.92 MiB 155.5
56+
│ OwnerDisplayName │ 14.30 MiB │ 142.58 MiB 9.97
57+
│ PostTypeId │ 20.93 MiB │ 565.30 MiB 27
58+
│ CreationDate │ 314.17 MiB │ 964.64 MiB 3.07
59+
│ LastEditDate │ 346.32 MiB │ 964.64 MiB 2.79
60+
│ LastEditorDisplayName │ 5.46 MiB │ 124.25 MiB 22.75
61+
│ CommunityOwnedDate │ 2.21 MiB │ 509.60 MiB 230.94
62+
└───────────────────────┴─────────────────┴───────────────────┴────────────
6363
```
6464

6565
We show both a compressed and uncompressed size here. Both are important. The compressed size equates to what we will need to read off disk - something we want to minimize for query performance (and storage cost). This data will need to be decompressed prior to reading. The size of this uncompressed size will be dependent on the data type used in this case. Minimizing this size will reduce memory overhead of queries and the amount of data which has to be processed by the query, improving utilization of caches and ultimately query times.
@@ -76,7 +76,7 @@ FROM system.columns
7676
WHERE table = 'posts'
7777

7878
┌─compressed_size─┬─uncompressed_size─┬─ratio─┐
79-
50.16 GiB │ 143.47 GiB 2.86
79+
50.16 GiB │ 143.47 GiB 2.86
8080
└─────────────────┴───────────────────┴───────┘
8181
```
8282

@@ -91,7 +91,7 @@ FROM system.columns
9191
WHERE `table` = 'posts_v3'
9292

9393
┌─compressed_size─┬─uncompressed_size─┬─ratio─┐
94-
25.15 GiB │ 68.87 GiB 2.74
94+
25.15 GiB │ 68.87 GiB 2.74
9595
└─────────────────┴───────────────────┴───────┘
9696
```
9797

@@ -206,17 +206,17 @@ ORDER BY
206206
`table` ASC
207207

208208
┌─table────┬─name────────┬─compressed_size─┬─uncompressed_size─┬─ratio─┐
209-
│ posts_v3 │ AnswerCount │ 9.67 MiB │ 113.69 MiB 11.76
210-
│ posts_v4 │ AnswerCount │ 10.39 MiB │ 111.31 MiB 10.71
211-
│ posts_v3 │ Id │ 159.70 MiB │ 227.38 MiB 1.42
212-
│ posts_v4 │ Id │ 64.91 MiB │ 222.63 MiB 3.43
213-
│ posts_v3 │ ViewCount │ 45.04 MiB │ 227.38 MiB 5.05
214-
│ posts_v4 │ ViewCount │ 52.72 MiB │ 222.63 MiB 4.22
209+
│ posts_v3 │ AnswerCount │ 9.67 MiB │ 113.69 MiB 11.76
210+
│ posts_v4 │ AnswerCount │ 10.39 MiB │ 111.31 MiB 10.71
211+
│ posts_v3 │ Id │ 159.70 MiB │ 227.38 MiB 1.42
212+
│ posts_v4 │ Id │ 64.91 MiB │ 222.63 MiB 3.43
213+
│ posts_v3 │ ViewCount │ 45.04 MiB │ 227.38 MiB 5.05
214+
│ posts_v4 │ ViewCount │ 52.72 MiB │ 222.63 MiB 4.22
215215
└──────────┴─────────────┴─────────────────┴───────────────────┴───────┘
216216

217217
6 rows in set. Elapsed: 0.008 sec
218218
```
219219

220220
### Compression in ClickHouse Cloud {#compression-in-clickhouse-cloud}
221221

222-
In ClickHouse Cloud, we utilize the `ZSTD` compression algorithm (with a default value of 1) by default. While compression speeds can vary for this algorithm, depending on the compression level (higher = slower), it has the advantage of being consistently fast on decompression (around 20% variance) and also benefiting from the ability to be parallelized. Our historical tests also suggest that this algorithm is often sufficiently effective and can even outperform `LZ4` combined with a codec. It is effective on most data types and information distributions, and is thus a sensible general-purpose default and why our initial earlier compression is already excellent even without optimization.
222+
In ClickHouse Cloud, we utilize the `ZSTD` compression algorithm (with a default value of 1) by default. While compression speeds can vary for this algorithm, depending on the compression level (higher = slower), it has the advantage of being consistently fast on decompression (around 20% variance) and also benefiting from the ability to be parallelized. Our historical tests also suggest that this algorithm is often sufficiently effective and can even outperform `LZ4` combined with a codec. It is effective on most data types and information distributions, and is thus a sensible general-purpose default and why our initial earlier compression is already excellent even without optimization.

0 commit comments

Comments
 (0)