You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/chdb/guides/jupysql.md
+9-7Lines changed: 9 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,8 @@ description: How to install chDB for Bun
6
6
keywords: [chdb, JupySQL]
7
7
---
8
8
9
+
import PlayersPerRank from '@site/static/images/chdb/guides/players_per_rank.png';
10
+
9
11
[JupySQL](https://jupysql.ploomber.io/en/latest/quick-start.html) is a Python library that lets you run SQL in Jupyter notebooks and the IPython shell.
10
12
In this guide, we're going to learn how to query data using chDB and JupySQL.
11
13
@@ -71,7 +73,7 @@ Next, let's import the `dbapi` module for chDB:
71
73
from chdb import dbapi
72
74
```
73
75
74
-
And we'll create a chDB connection.
76
+
And we'll create a chDB connection.
75
77
Any data that we persist will be saved to the `atp.chdb` directory:
76
78
77
79
```python
@@ -93,7 +95,7 @@ Next, we'll display the display limit so that results of queries won't be trunca
93
95
94
96
## Querying data in CSV files {#querying-data-in-csv-files}
95
97
96
-
We've downloaded a bunch of files with the `atp_rankings` prefix.
98
+
We've downloaded a bunch of files with the `atp_rankings` prefix.
97
99
Let's use the `DESCRIBE` clause to understand the schema:
98
100
99
101
@@ -273,7 +275,7 @@ We're going to write a query that finds the maximum points accumulate by each pl
273
275
274
276
```python
275
277
%%sql
276
-
SELECT name_first, name_last,
278
+
SELECT name_first, name_last,
277
279
max(points) as maxPoints,
278
280
argMax(rank, points) as rank,
279
281
argMax(ranking_date, points) as date
@@ -305,12 +307,12 @@ It's quite interesting that some of the players in this list accumulated a lot o
305
307
306
308
## Saving queries {#saving-queries}
307
309
308
-
We can save queries using the `--save` parameter on the same line as the `%%sql` magic.
310
+
We can save queries using the `--save` parameter on the same line as the `%%sql` magic.
309
311
The `--no-execute` parameter means that query execution will be skipped.
310
312
311
313
```python
312
314
%%sql --save best_points --no-execute
313
-
SELECT name_first, name_last,
315
+
SELECT name_first, name_last,
314
316
max(points) as maxPoints,
315
317
argMax(rank, points) as rank,
316
318
argMax(ranking_date, points) as date
@@ -357,7 +359,7 @@ Parameters are just normal variables:
357
359
rank =10
358
360
```
359
361
360
-
And then we can use the `{{variable}}` syntax in our query.
362
+
And then we can use the `{{variable}}` syntax in our query.
361
363
The following query finds the players who had the least number of days between when they first had a ranking in the top 10 and last had a ranking in the top 10:
Copy file name to clipboardExpand all lines: docs/deployment-guides/horizontal-scaling.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ title: Scaling out
6
6
---
7
7
import ReplicationShardingTerminology from '@site/docs/_snippets/_replication-sharding-terminology.md';
8
8
import ConfigFileNote from '@site/docs/_snippets/_config-files.md';
9
-
9
+
import scalingOut1 from '@site/static/images/deployment-guides/scaling-out-1.png';
10
10
11
11
## Description {#description}
12
12
This example architecture is designed to provide scalability. It includes three nodes: two combined ClickHouse plus coordination (ClickHouse Keeper) servers, and a third server with only ClickHouse Keeper to finish the quorum of three. With this example, we'll create a database, table, and a distributed table that will be able to query the data on both of the nodes.
@@ -17,7 +17,8 @@ This example architecture is designed to provide scalability. It includes three
17
17
18
18
## Environment {#environment}
19
19
### Architecture Diagram {#architecture-diagram}
20
-

20
+
21
+
<imgsrc={scalingOut1}alt="Architecture diagram for 2 shards and 1 replica" />
21
22
22
23
|Node|Description|
23
24
|----|-----------|
@@ -31,7 +32,7 @@ In production environments we strongly recommend that ClickHouse Keeper runs on
31
32
32
33
## Install {#install}
33
34
34
-
Install Clickhouse on three servers following the [instructions for your archive type](/getting-started/install.md/#available-installation-options) (.deb, .rpm, .tar.gz, etc.). For this example, you will follow the installation instructions for ClickHouse Server and Client on all three machines.
35
+
Install Clickhouse on three servers following the [instructions for your archive type](/getting-started/install.md/#available-installation-options) (.deb, .rpm, .tar.gz, etc.). For this example, you will follow the installation instructions for ClickHouse Server and Client on all three machines.
@@ -45,7 +46,7 @@ For `chnode1`, there are five configuration files. You may choose to combine th
45
46
46
47
These values can be customized as you wish. This example configuration gives you a debug log that will roll over at 1000M three times. ClickHouse will listen on the IPv4 network on ports 8123 and 9000, and will use port 9009 for interserver communication.
47
48
48
-
```xml title="network-and-logging.xml on chnode1"
49
+
```xml title="network-and-logging.xml on chnode1"
49
50
<clickhouse>
50
51
<logger>
51
52
<level>debug</level>
@@ -110,8 +111,8 @@ If for any reason a Keeper node is replaced or rebuilt, do not reuse an existing
110
111
111
112
### Macros configuration {#macros-configuration}
112
113
113
-
The macros `shard` and `replica` reduce the complexity of distributed DDL. The values configured are automatically substituted in your DDL queries, which simplifies your DDL. The macros for this configuration specify the shard and replica number for each node.
114
-
In this 2 shard 1 replica example, the replica macro is `replica_1` on both chnode1 and chnode2 as there is only one replica. The shard macro is `1` on chnode1 and `2` on chnode2.
114
+
The macros `shard` and `replica` reduce the complexity of distributed DDL. The values configured are automatically substituted in your DDL queries, which simplifies your DDL. The macros for this configuration specify the shard and replica number for each node.
115
+
In this 2 shard 1 replica example, the replica macro is `replica_1` on both chnode1 and chnode2 as there is only one replica. The shard macro is `1` on chnode1 and `2` on chnode2.
115
116
116
117
```xml title="macros.xml on chnode1"
117
118
<clickhouse>
@@ -126,7 +127,7 @@ In this 2 shard 1 replica example, the replica macro is `replica_1` on both chno
126
127
### Replication and sharding configuration {#replication-and-sharding-configuration}
127
128
128
129
Starting from the top:
129
-
- The `remote_servers` section of the XML specifies each of the clusters in the environment. The attribute `replace=true` replaces the sample `remote_servers` in the default ClickHouse configuration with the `remote_servers` configuration specified in this file. Without this attribute, the remote servers in this file would be appended to the list of samples in the default.
130
+
- The `remote_servers` section of the XML specifies each of the clusters in the environment. The attribute `replace=true` replaces the sample `remote_servers` in the default ClickHouse configuration with the `remote_servers` configuration specified in this file. Without this attribute, the remote servers in this file would be appended to the list of samples in the default.
130
131
- In this example, there is one cluster named `cluster_2S_1R`.
131
132
- A secret is created for the cluster named `cluster_2S_1R` with the value `mysecretphrase`. The secret is shared across all of the remote servers in the environment to ensure that the correct servers are joined together.
132
133
- The cluster `cluster_2S_1R` has two shards, and each of those shards has one replica. Take a look at the architecture diagram toward the beginning of this document, and compare it with the two `shard` definitions in the XML below. In each of the shard definitions there is one replica. The replica is for that specific shard. The host and port for that replica is specified. The replica for the first shard in the configuration is stored on `chnode1`, and the replica for the second shard in the configuration is stored on `chnode2`.
@@ -158,7 +159,7 @@ Starting from the top:
158
159
159
160
### Configuring the use of Keeper {#configuring-the-use-of-keeper}
160
161
161
-
Up above a few files ClickHouse Keeper was configured. This configuration file `use-keeper.xml` is configuring ClickHouse Server to use ClickHouse Keeper for the coordination of replication and distributed DDL. This file specifies that ClickHouse Server should use Keeper on nodes chnode1 - 3 on port 9181, and the file is the same on `chnode1` and `chnode2`.
162
+
Up above a few files ClickHouse Keeper was configured. This configuration file `use-keeper.xml` is configuring ClickHouse Server to use ClickHouse Keeper for the coordination of replication and distributed DDL. This file specifies that ClickHouse Server should use Keeper on nodes chnode1 - 3 on port 9181, and the file is the same on `chnode1` and `chnode2`.
162
163
163
164
```xml title="use-keeper.xml on chnode1"
164
165
<clickhouse>
@@ -185,7 +186,7 @@ As the configuration is very similar on `chnode1` and `chnode2`, only the differ
185
186
186
187
### Network and logging configuration {#network-and-logging-configuration-1}
187
188
188
-
```xml title="network-and-logging.xml on chnode2"
189
+
```xml title="network-and-logging.xml on chnode2"
189
190
<clickhouse>
190
191
<logger>
191
192
<level>debug</level>
@@ -311,7 +312,7 @@ As `chnode3` is not storing data and is only used for ClickHouse Keeper to provi
311
312
312
313
### Network and logging configuration {#network-and-logging-configuration-2}
313
314
314
-
```xml title="network-and-logging.xml on chnode3"
315
+
```xml title="network-and-logging.xml on chnode3"
315
316
<clickhouse>
316
317
<logger>
317
318
<level>debug</level>
@@ -480,4 +481,3 @@ SELECT * FROM db1.table1_dist;
480
481
481
482
- The [Distributed Table Engine](/engines/table-engines/special/distributed.md)
Copy file name to clipboardExpand all lines: docs/deployment-guides/replicated.md
+11-9Lines changed: 11 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,10 +4,11 @@ sidebar_label: Replication for fault tolerance
4
4
sidebar_position: 10
5
5
title: Replication for fault tolerance
6
6
---
7
+
7
8
import ReplicationShardingTerminology from '@site/docs/_snippets/_replication-sharding-terminology.md';
8
9
import ConfigFileNote from '@site/docs/_snippets/_config-files.md';
9
10
import KeeperConfigFileNote from '@site/docs/_snippets/_keeper-config-files.md';
10
-
11
+
import ReplicationArchitecture from '@site/static/images/deployment-guides/architecture_1s_2r_3_nodes.png';
11
12
12
13
## Description {#description}
13
14
In this architecture, there are five servers configured. Two are used to host copies of the data. The other three servers are used to coordinate the replication of data. With this example, we'll create a database and table that will be replicated across both data nodes using the ReplicatedMergeTree table engine.
@@ -18,7 +19,8 @@ In this architecture, there are five servers configured. Two are used to host co
18
19
19
20
## Environment {#environment}
20
21
### Architecture Diagram {#architecture-diagram}
21
-

22
+
23
+
<imgsrc={ReplicationArchitecture}alt="Architecture diagram for 1 shard and 2 replicas with ReplicatedMergeTree" />
22
24
23
25
|Node|Description|
24
26
|----|-----------|
@@ -34,7 +36,7 @@ In production environments, we strongly recommend using *dedicated* hosts for Cl
34
36
35
37
## Install {#install}
36
38
37
-
Install ClickHouse server and client on the two servers `clickhouse-01` and `clickhouse-02` following the [instructions for your archive type](/getting-started/install.md/#available-installation-options) (.deb, .rpm, .tar.gz, etc.).
39
+
Install ClickHouse server and client on the two servers `clickhouse-01` and `clickhouse-02` following the [instructions for your archive type](/getting-started/install.md/#available-installation-options) (.deb, .rpm, .tar.gz, etc.).
38
40
39
41
Install ClickHouse Keeper on the three servers `clickhouse-keeper-01`, `clickhouse-keeper-02` and `clickhouse-keeper-03` following the [instructions for your archive type](/getting-started/install.md/#install-standalone-clickhouse-keeper) (.deb, .rpm, .tar.gz, etc.).
40
42
@@ -53,7 +55,7 @@ These values can be customized as you wish. This example configuration gives yo
53
55
- the name displayed when you connect with `clickhouse-client` is `cluster_1S_2R node 1`
54
56
- ClickHouse will listen on the IPV4 network on ports 8123 and 9000.
55
57
56
-
```xml title="/etc/clickhouse-server/config.d/network-and-logging.xml on clickhouse-01"
58
+
```xml title="/etc/clickhouse-server/config.d/network-and-logging.xml on clickhouse-01"
57
59
<clickhouse>
58
60
<logger>
59
61
<level>debug</level>
@@ -71,8 +73,8 @@ These values can be customized as you wish. This example configuration gives yo
71
73
72
74
### Macros configuration {#macros-configuration}
73
75
74
-
The macros `shard` and `replica` reduce the complexity of distributed DDL. The values configured are automatically substituted in your DDL queries, which simplifies your DDL. The macros for this configuration specify the shard and replica number for each node.
75
-
In this 1 shard 2 replica example, the replica macro is `replica_1` on clickhouse-01 and `replica_2` on clickhouse-02. The shard macro is `1` on both clickhouse-01 and clickhouse-02 as there is only one shard.
76
+
The macros `shard` and `replica` reduce the complexity of distributed DDL. The values configured are automatically substituted in your DDL queries, which simplifies your DDL. The macros for this configuration specify the shard and replica number for each node.
77
+
In this 1 shard 2 replica example, the replica macro is `replica_1` on clickhouse-01 and `replica_2` on clickhouse-02. The shard macro is `1` on both clickhouse-01 and clickhouse-02 as there is only one shard.
76
78
77
79
```xml title="/etc/clickhouse-server/config.d/macros.xml on clickhouse-01"
78
80
<clickhouse>
@@ -88,7 +90,7 @@ In this 1 shard 2 replica example, the replica macro is `replica_1` on clickhous
88
90
### Replication and sharding configuration {#replication-and-sharding-configuration}
89
91
90
92
Starting from the top:
91
-
- The remote_servers section of the XML specifies each of the clusters in the environment. The attribute `replace=true` replaces the sample remote_servers in the default ClickHouse configuration with the remote_server configuration specified in this file. Without this attribute the remote servers in this file would be appended to the list of samples in the default.
93
+
- The remote_servers section of the XML specifies each of the clusters in the environment. The attribute `replace=true` replaces the sample remote_servers in the default ClickHouse configuration with the remote_server configuration specified in this file. Without this attribute the remote servers in this file would be appended to the list of samples in the default.
92
94
- In this example, there is one cluster named `cluster_1S_2R`.
93
95
- A secret is created for the cluster named `cluster_1S_2R` with the value `mysecretphrase`. The secret is shared across all of the remote servers in the environment to ensure that the correct servers are joined together.
94
96
- The cluster `cluster_1S_2R` has one shard, and two replicas. Take a look at the architecture diagram toward the beginning of this document, and compare it with the `shard` definition in the XML below. The shard definition contains two replicas. The host and port for each replica is specified. One replica is stored on `clickhouse-01`, and the other replica is stored on `clickhouse-02`.
@@ -117,7 +119,7 @@ Starting from the top:
117
119
118
120
### Configuring the use of Keeper {#configuring-the-use-of-keeper}
119
121
120
-
This configuration file `use-keeper.xml` is configuring ClickHouse Server to use ClickHouse Keeper for the coordination of replication and distributed DDL. This file specifies that ClickHouse Server should use Keeper on nodes clickhouse-keeper-01 - 03 on port 9181, and the file is the same on `clickhouse-01` and `clickhouse-02`.
122
+
This configuration file `use-keeper.xml` is configuring ClickHouse Server to use ClickHouse Keeper for the coordination of replication and distributed DDL. This file specifies that ClickHouse Server should use Keeper on nodes clickhouse-keeper-01 - 03 on port 9181, and the file is the same on `clickhouse-01` and `clickhouse-02`.
121
123
122
124
```xml title="/etc/clickhouse-server/config.d/use-keeper.xml on clickhouse-01"
123
125
<clickhouse>
@@ -147,7 +149,7 @@ As the configuration is very similar on clickhouse-01 and clickhouse-02 only the
147
149
148
150
This file is the same on both clickhouse-01 and clickhouse-02, with the exception of `display_name`.
149
151
150
-
```xml title="/etc/clickhouse-server/config.d/network-and-logging.xml on clickhouse-02"
152
+
```xml title="/etc/clickhouse-server/config.d/network-and-logging.xml on clickhouse-02"
Copy file name to clipboardExpand all lines: docs/faq/general/columnar-database.md
+5-2Lines changed: 5 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,9 @@ toc_hidden: true
5
5
toc_priority: 101
6
6
---
7
7
8
+
import RowOriented from '@site/static/images/row-oriented.gif';
9
+
import ColumnOriented from '@site/static/images/column-oriented.gif';
10
+
8
11
# What Is a Columnar Database? {#what-is-a-columnar-database}
9
12
10
13
A columnar database stores the data of each column independently. This allows reading data from disk only for those columns that are used in any given query. The cost is that operations that affect whole rows become proportionally more expensive. The synonym for a columnar database is a column-oriented database management system. ClickHouse is a typical example of such a system.
A columnar database is the preferred choice for analytical applications because it allows having many columns in a table just in case, but to not pay the cost for unused columns on read query execution time (a traditional OLTP database reads all of the data during queries as the data is stored in rows and not columns). Column-oriented databases are designed for big data processing and data warehousing, they often natively scale using distributed clusters of low-cost hardware to increase throughput. ClickHouse does it with combination of [distributed](../../engines/table-engines/special/distributed.md) and [replicated](../../engines/table-engines/mergetree-family/replication.md) tables.
0 commit comments