Skip to content

Commit 264332a

Browse files
committed
Merge branch 'main' of https://github.com/ClickHouse/clickhouse-docs into fix_settings_cls
2 parents bc6e89b + 81dd505 commit 264332a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+606
-1380
lines changed

.gitignore

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,10 @@ docs/cloud/manage/api/prometheus-api-reference.md
5050
docs/cloud/manage/api/usageCost-api-reference.md
5151
docs/whats-new/changelog/index.md
5252
docs/about-us/beta-and-experimental-features.md
53+
static/knowledgebase_toc.json
54+
.floating-pages-validation-failed
55+
.frontmatter-validation-failed
56+
logs/
5357

5458
.vscode
5559
.aspell.en.prepl
@@ -59,8 +63,8 @@ docs/about-us/beta-and-experimental-features.md
5963
**.translate
6064
/ClickHouse/
6165

62-
6366
# Ignore table of contents files
6467
docs/cloud/reference/release-notes-index.md
6568
docs/whats-new/changelog/index.md
66-
docs/cloud/manage/api/api-reference-index.md
69+
docs/cloud/manage/api/api-reference-index.md
70+
docs/getting-started/index.md

docs/about-us/beta-and-experimental-features.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,3 +38,10 @@ Note: please be sure to be using a current version of the ClickHouse [compatibil
3838
- Cannot be enabled in the cloud
3939

4040
Please note: no additional experimental features are allowed to be enabled in ClickHouse Cloud other than those listed above as Beta.
41+
42+
<!-- The inner content of the tags below are replaced at build time with a table generated from source
43+
Please do not modify or remove the tags
44+
-->
45+
46+
<!--AUTOGENERATED_START-->
47+
<!--AUTOGENERATED_END-->

docs/cloud/reference/byoc.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ BYOC (Bring Your Own Cloud) allows you to deploy ClickHouse Cloud on your own cl
2424

2525
**If you would like access, please [contact us](https://clickhouse.com/cloud/bring-your-own-cloud).** Refer to our [Terms of Service](https://clickhouse.com/legal/agreements/terms-of-service) for additional information.
2626

27-
BYOC is currently only supported for AWS, with GCP and Microsoft Azure in development.
27+
BYOC is currently only supported for AWS. You can join the wait list for GCP and Azure [here](https://clickhouse.com/cloud/bring-your-own-cloud).
2828

2929
:::note
3030
BYOC is designed specifically for large-scale deployments, and requires customers to sign a committed contract.

docs/cloud/reference/warehouses.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ Each compute node group will have its own endpoint so you can choose which set o
4545

4646
_Fig. 2 - compute separation in ClickHouse Cloud_
4747

48-
In this private preview program, you will have the ability to create extra services that share the same data with your existing services, or create a completely new setup with multiple services sharing the same data.
48+
It is possible to create extra services that share the same data with your existing services, or create a completely new setup with multiple services sharing the same data.
4949

5050
## What is a Warehouse? {#what-is-a-warehouse}
5151

@@ -122,8 +122,6 @@ Once compute-compute is enabled for a service (at least one secondary service wa
122122

123123
## Limitations {#limitations}
124124

125-
Because this compute-compute separation is currently in private preview, there are some limitations to using this feature. Most of these limitations will be removed once the feature is released to GA (general availability):
126-
127125
1. **Primary service should always be up and should not be idled (limitation will be removed some time after GA).** During the private preview and some time after GA, the primary service (usually the existing service that you want to extend by adding other services) will be always up and will have the idling setting disabled. You will not be able to stop or idle the primary service if there is at least one secondary service. Once all secondary services are removed, you can stop or idle the original service again.
128126

129127
2. **Sometimes workloads cannot be isolated.** Though the goal is to give you an option to isolate database workloads from each other, there can be corner cases where one workload in one service will affect another service sharing the same data. These are quite rare situations that are mostly connected to OLTP-like workloads.
@@ -146,7 +144,7 @@ settings distributed_ddl_task_timeout=0
146144

147145
## Pricing {#pricing}
148146

149-
Extra services created during the private preview are billed as usual. Compute prices are the same for all services in a warehouse (primary and secondary). Storage is billed only once - it is included in the first (original) service.
147+
Compute prices are the same for all services in a warehouse (primary and secondary). Storage is billed only once - it is included in the first (original) service.
150148

151149
## Backups {#backups}
152150

docs/cloud/support.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,6 @@ description: 'Learn about Cloud Support'
66
hide_title: true
77
---
88

9-
# Cloud Support
10-
119
import Content from '@site/docs/about-us/support.md';
1210

1311
<Content />

docs/getting-started/example-datasets/cell-towers.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -17,17 +17,17 @@ import ActionsMenu from '@site/docs/_snippets/_service_actions_menu.md';
1717
import SQLConsoleDetail from '@site/docs/_snippets/_launch_sql_console.md';
1818
import SupersetDocker from '@site/docs/_snippets/_add_superset_detail.md';
1919
import cloud_load_data_sample from '@site/static/images/_snippets/cloud-load-data-sample.png';
20-
import cell_towers_1 from '@site/docs/getting-started/example-datasets/images/superset-cell-tower-dashboard.png'
21-
import add_a_database from '@site/docs/getting-started/example-datasets/images/superset-add.png'
22-
import choose_clickhouse_connect from '@site/docs/getting-started/example-datasets/images/superset-choose-a-database.png'
23-
import add_clickhouse_as_superset_datasource from '@site/docs/getting-started/example-datasets/images/superset-connect-a-database.png'
24-
import add_cell_towers_table_as_dataset from '@site/docs/getting-started/example-datasets/images/superset-add-dataset.png'
25-
import create_a_map_in_superset from '@site/docs/getting-started/example-datasets/images/superset-create-map.png'
26-
import specify_long_and_lat from '@site/docs/getting-started/example-datasets/images/superset-lon-lat.png'
27-
import superset_mcc_2024 from '@site/docs/getting-started/example-datasets/images/superset-mcc-204.png'
28-
import superset_radio_umts from '@site/docs/getting-started/example-datasets/images/superset-radio-umts.png'
29-
import superset_umts_netherlands from '@site/docs/getting-started/example-datasets/images/superset-umts-netherlands.png'
30-
import superset_cell_tower_dashboard from '@site/docs/getting-started/example-datasets/images/superset-cell-tower-dashboard.png'
20+
import cell_towers_1 from '@site/static/images/getting-started/example-datasets/superset-cell-tower-dashboard.png'
21+
import add_a_database from '@site/static/images/getting-started/example-datasets/superset-add.png'
22+
import choose_clickhouse_connect from '@site/static/images/getting-started/example-datasets/superset-choose-a-database.png'
23+
import add_clickhouse_as_superset_datasource from '@site/static/images/getting-started/example-datasets/superset-connect-a-database.png'
24+
import add_cell_towers_table_as_dataset from '@site/static/images/getting-started/example-datasets/superset-add-dataset.png'
25+
import create_a_map_in_superset from '@site/static/images/getting-started/example-datasets/superset-create-map.png'
26+
import specify_long_and_lat from '@site/static/images/getting-started/example-datasets/superset-lon-lat.png'
27+
import superset_mcc_2024 from '@site/static/images/getting-started/example-datasets/superset-mcc-204.png'
28+
import superset_radio_umts from '@site/static/images/getting-started/example-datasets/superset-radio-umts.png'
29+
import superset_umts_netherlands from '@site/static/images/getting-started/example-datasets/superset-umts-netherlands.png'
30+
import superset_cell_tower_dashboard from '@site/static/images/getting-started/example-datasets/superset-cell-tower-dashboard.png'
3131

3232
## Goal {#goal}
3333

docs/getting-started/example-datasets/environmental-sensors.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ title: 'Environmental Sensors Data'
77
---
88

99
import Image from '@theme/IdealImage';
10-
import no_events_per_day from './images/sensors_01.png';
11-
import sensors_02 from './images/sensors_02.png';
10+
import no_events_per_day from '@site/static/images/getting-started/example-datasets/sensors_01.png';
11+
import sensors_02 from '@site/static/images/getting-started/example-datasets/sensors_02.png';
1212

1313
[Sensor.Community](https://sensor.community/en/) is a contributors-driven global sensor network that creates Open Environmental Data. The data is collected from sensors all over the globe. Anyone can purchase a sensor and place it wherever they like. The APIs to download the data is in [GitHub](https://github.com/opendata-stuttgart/meta/wiki/APIs) and the data is freely available under the [Database Contents License (DbCL)](https://opendatacommons.org/licenses/dbcl/1-0/).
1414

Lines changed: 278 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,278 @@
1+
---
2+
description: 'Dataset with over 100 million records containing information about places on a map, such as shops,
3+
restaurants, parks, playgrounds, and monuments.'
4+
sidebar_label: 'Foursquare places'
5+
slug: /getting-started/example-datasets/foursquare-places
6+
title: 'Foursquare places'
7+
keywords: ['visualizing']
8+
---
9+
10+
import Image from '@theme/IdealImage';
11+
import visualization_1 from '@site/static/images/getting-started/example-datasets/visualization_1.png';
12+
import visualization_2 from '@site/static/images/getting-started/example-datasets/visualization_2.png';
13+
import visualization_3 from '@site/static/images/getting-started/example-datasets/visualization_3.png';
14+
import visualization_4 from '@site/static/images/getting-started/example-datasets/visualization_4.png';
15+
16+
## Dataset {#dataset}
17+
18+
This dataset by Foursquare is available to [download](https://docs.foursquare.com/data-products/docs/access-fsq-os-places)
19+
and to use for free under the Apache 2.0 license.
20+
21+
It contains over 100 million records of commercial points-of-interest (POI),
22+
such as shops, restaurants, parks, playgrounds, and monuments. It also includes
23+
additional metadata about those places, such as categories and social media
24+
information.
25+
26+
## Data exploration {#data-exploration}
27+
28+
For exploring the data we'll use [`clickhouse-local`](https://clickhouse.com/blog/extracting-converting-querying-local-files-with-sql-clickhouse-local), a small command-line tool
29+
that provides the full ClickHouse engine, although you could also use
30+
ClickHouse Cloud, `clickhouse-client` or even `chDB`.
31+
32+
Run the following query to select the data from the s3 bucket where the data is stored:
33+
34+
```sql title="Query"
35+
SELECT * FROM s3('s3://fsq-os-places-us-east-1/release/dt=2025-04-08/places/parquet/*') LIMIT 1
36+
```
37+
38+
```response title="Response"
39+
Row 1:
40+
──────
41+
fsq_place_id: 4e1ef76cae60cd553dec233f
42+
name: @VirginAmerica In-flight Via @Gogo
43+
latitude: 37.62120111687914
44+
longitude: -122.39003793803701
45+
address: ᴺᵁᴸᴸ
46+
locality: ᴺᵁᴸᴸ
47+
region: ᴺᵁᴸᴸ
48+
postcode: ᴺᵁᴸᴸ
49+
admin_region: ᴺᵁᴸᴸ
50+
post_town: ᴺᵁᴸᴸ
51+
po_box: ᴺᵁᴸᴸ
52+
country: US
53+
date_created: 2011-07-14
54+
date_refreshed: 2018-07-05
55+
date_closed: 2018-07-05
56+
tel: ᴺᵁᴸᴸ
57+
website: ᴺᵁᴸᴸ
58+
email: ᴺᵁᴸᴸ
59+
facebook_id: ᴺᵁᴸᴸ
60+
instagram: ᴺᵁᴸᴸ
61+
twitter: ᴺᵁᴸᴸ
62+
fsq_category_ids: ['4bf58dd8d48988d1f7931735']
63+
fsq_category_labels: ['Travel and Transportation > Transport Hub > Airport > Plane']
64+
placemaker_url: https://foursquare.com/placemakers/review-place/4e1ef76cae60cd553dec233f
65+
geom: �^��a�^@B�
66+
bbox: (-122.39003793803701,37.62120111687914,-122.39003793803701,37.62120111687914)
67+
```
68+
69+
We see that quite a few fields have `ᴺᵁᴸᴸ`, so we can add some additional conditions
70+
to our query to get back more usable data:
71+
72+
```sql title="Query"
73+
SELECT * FROM s3('s3://fsq-os-places-us-east-1/release/dt=2025-04-08/places/parquet/*')
74+
WHERE address IS NOT NULL AND postcode IS NOT NULL AND instagram IS NOT NULL LIMIT 1
75+
```
76+
77+
```response
78+
Row 1:
79+
──────
80+
fsq_place_id: 59b2c754b54618784f259654
81+
name: Villa 722
82+
latitude: ᴺᵁᴸᴸ
83+
longitude: ᴺᵁᴸᴸ
84+
address: Gijzenveldstraat 75
85+
locality: Zutendaal
86+
region: Limburg
87+
postcode: 3690
88+
admin_region: ᴺᵁᴸᴸ
89+
post_town: ᴺᵁᴸᴸ
90+
po_box: ᴺᵁᴸᴸ
91+
country: ᴺᵁᴸᴸ
92+
date_created: 2017-09-08
93+
date_refreshed: 2020-01-25
94+
date_closed: ᴺᵁᴸᴸ
95+
tel: ᴺᵁᴸᴸ
96+
website: https://www.landal.be
97+
email: ᴺᵁᴸᴸ
98+
facebook_id: 522698844570949 -- 522.70 trillion
99+
instagram: landalmooizutendaal
100+
twitter: landalzdl
101+
fsq_category_ids: ['56aa371be4b08b9a8d5734e1']
102+
fsq_category_labels: ['Travel and Transportation > Lodging > Vacation Rental']
103+
placemaker_url: https://foursquare.com/placemakers/review-place/59b2c754b54618784f259654
104+
geom: ᴺᵁᴸᴸ
105+
bbox: (NULL,NULL,NULL,NULL)
106+
```
107+
108+
Run the following query to view the automatically inferred schema of the data using
109+
the `DESCRIBE`:
110+
111+
```sql title="Query"
112+
DESCRIBE s3('s3://fsq-os-places-us-east-1/release/dt=2025-04-08/places/parquet/*')
113+
```
114+
115+
```response title="Response"
116+
┌─name────────────────┬─type────────────────────────┬
117+
1. │ fsq_place_id │ Nullable(String) │
118+
2. │ name │ Nullable(String) │
119+
3. │ latitude │ Nullable(Float64) │
120+
4. │ longitude │ Nullable(Float64) │
121+
5. │ address │ Nullable(String) │
122+
6. │ locality │ Nullable(String) │
123+
7. │ region │ Nullable(String) │
124+
8. │ postcode │ Nullable(String) │
125+
9. │ admin_region │ Nullable(String) │
126+
10. │ post_town │ Nullable(String) │
127+
11. │ po_box │ Nullable(String) │
128+
12. │ country │ Nullable(String) │
129+
13. │ date_created │ Nullable(String) │
130+
14. │ date_refreshed │ Nullable(String) │
131+
15. │ date_closed │ Nullable(String) │
132+
16. │ tel │ Nullable(String) │
133+
17. │ website │ Nullable(String) │
134+
18. │ email │ Nullable(String) │
135+
19. │ facebook_id │ Nullable(Int64) │
136+
20. │ instagram │ Nullable(String) │
137+
21. │ twitter │ Nullable(String) │
138+
22. │ fsq_category_ids │ Array(Nullable(String)) │
139+
23. │ fsq_category_labels │ Array(Nullable(String)) │
140+
24. │ placemaker_url │ Nullable(String) │
141+
25. │ geom │ Nullable(String) │
142+
26. │ bbox │ Tuple( ↴│
143+
│ │↳ xmin Nullable(Float64),↴│
144+
│ │↳ ymin Nullable(Float64),↴│
145+
│ │↳ xmax Nullable(Float64),↴│
146+
│ │↳ ymax Nullable(Float64)) │
147+
└─────────────────────┴─────────────────────────────┘
148+
```
149+
150+
## Loading the data into ClickHouse {#loading-the-data}
151+
152+
If you'd like to persist the data on disk, you can use `clickhouse-server`
153+
or ClickHouse Cloud.
154+
155+
To create the table, run the following command:
156+
157+
```sql title="Query"
158+
CREATE TABLE foursquare_mercator
159+
(
160+
fsq_place_id String,
161+
name String,
162+
latitude Float64,
163+
longitude Float64,
164+
address String,
165+
locality String,
166+
region LowCardinality(String),
167+
postcode LowCardinality(String),
168+
admin_region LowCardinality(String),
169+
post_town LowCardinality(String),
170+
po_box LowCardinality(String),
171+
country LowCardinality(String),
172+
date_created Nullable(Date),
173+
date_refreshed Nullable(Date),
174+
date_closed Nullable(Date),
175+
tel String,
176+
website String,
177+
email String,
178+
facebook_id String,
179+
instagram String,
180+
twitter String,
181+
fsq_category_ids Array(String),
182+
fsq_category_labels Array(String),
183+
placemaker_url String,
184+
geom String,
185+
bbox Tuple(
186+
xmin Nullable(Float64),
187+
ymin Nullable(Float64),
188+
xmax Nullable(Float64),
189+
ymax Nullable(Float64)
190+
),
191+
category LowCardinality(String) ALIAS fsq_category_labels[1],
192+
mercator_x UInt32 MATERIALIZED 0xFFFFFFFF * ((longitude + 180) / 360),
193+
mercator_y UInt32 MATERIALIZED 0xFFFFFFFF * ((1 / 2) - ((log(tan(((latitude + 90) / 360) * pi())) / 2) / pi())),
194+
INDEX idx_x mercator_x TYPE minmax,
195+
INDEX idx_y mercator_y TYPE minmax
196+
)
197+
ORDER BY mortonEncode(mercator_x, mercator_y)
198+
```
199+
200+
Take note of the use of the [`LowCardinality`](/sql-reference/data-types/lowcardinality)
201+
data type for several columns which changes the internal representation of the data
202+
types to be dictionary-encoded. Operating with dictionary encoded data significantly
203+
increases the performance of `SELECT` queries for many applications.
204+
205+
Additionally, two `UInt32` `MATERIALIZED` columns, `mercator_x` and `mercator_y` are created
206+
that map the lat/lon coordinates to the [Web Mercator projection](https://en.wikipedia.org/wiki/Web_Mercator_projection)
207+
for easier segmentation of the map into tiles:
208+
209+
```sql
210+
mercator_x UInt32 MATERIALIZED 0xFFFFFFFF * ((longitude + 180) / 360),
211+
mercator_y UInt32 MATERIALIZED 0xFFFFFFFF * ((1 / 2) - ((log(tan(((latitude + 90) / 360) * pi())) / 2) / pi())),
212+
```
213+
214+
Let's break down what is happening above for each column.
215+
216+
**mercator_x**
217+
218+
This column converts a longitude value into an X coordinate in the Mercator projection:
219+
220+
- `longitude + 180` shifts the longitude range from [-180, 180] to [0, 360]
221+
- Dividing by 360 normalizes this to a value between 0 and 1
222+
- Multiplying by `0xFFFFFFFF` (hex for maximum 32-bit unsigned integer) scales this normalized value to the full range of a 32-bit integer
223+
224+
**mercator_y**
225+
226+
This column converts a latitude value into a Y coordinate in the Mercator projection:
227+
228+
- `latitude + 90` shifts latitude from [-90, 90] to [0, 180]
229+
- Dividing by 360 and multiplying by pi() converts to radians for the trigonometric functions
230+
- The `log(tan(...))` part is the core of the Mercator projection formula
231+
- multiplying by `0xFFFFFFFF` scales to the full 32-bit integer range
232+
233+
Specifying `MATERIALIZED` makes sure that ClickHouse calculates the values for these
234+
columns when we `INSERT` the data, without having to specify these columns (which are not
235+
part of the original data schema) in the `INSERT statement.
236+
237+
The table is ordered by `mortonEncode(mercator_x, mercator_y)` which produces a
238+
Z-order space-filling curve of `mercator_x`, `mercator_y` in order to significantly
239+
improve geospatial query performance. This Z-order curve ordering ensures data is
240+
physically organized by spatial proximity:
241+
242+
```sql
243+
ORDER BY mortonEncode(mercator_x, mercator_y)
244+
```
245+
246+
Two `minmax` indices are also created for faster search:
247+
248+
```sql
249+
INDEX idx_x mercator_x TYPE minmax,
250+
INDEX idx_y mercator_y TYPE minmax
251+
```
252+
253+
As you can see, ClickHouse has absolutely everything you need for real-time
254+
mapping applications!
255+
256+
Run the following query to load the data:
257+
258+
```sql
259+
INSERT INTO foursquare_mercator
260+
SELECT * FROM s3('s3://fsq-os-places-us-east-1/release/dt=2025-04-08/places/parquet/*')
261+
```
262+
263+
## Visualizing the data {#data-visualization}
264+
265+
To see what's possible with this dataset, check out [adsb.exposed](https://adsb.exposed/?dataset=Places&zoom=5&lat=52.3488&lng=4.9219).
266+
adsb.exposed was originally built by co-founder and CTO Alexey Milovidov to visualize ADS-B (Automatic Dependent Surveillance-Broadcast)
267+
flight data, which is 1000x times larger. During a company hackathon Alexey added the Foursquare data to the tool.
268+
269+
Some of our favourite visualizations are produced here below for you to enjoy.
270+
271+
<Image img={visualization_1} size="md" alt="Density map of points of interest in Europe"/>
272+
273+
<Image img={visualization_2} size="md" alt="Sake bars in Japan"/>
274+
275+
<Image img={visualization_3} size="md" alt="ATMs"/>
276+
277+
<Image img={visualization_4} size="md" alt="Map of Europe with points of interest categorised by country"/>
278+

0 commit comments

Comments
 (0)