@@ -8,22 +8,22 @@ description: 'Page describing an example architecture with five servers configur
8
8
9
9
import Image from '@theme/IdealImage ';
10
10
import ReplicationShardingTerminology from '@site/docs /_ snippets/_ replication-sharding-terminology.md';
11
+ import ReplicationArchitecture from '@site/static /images/deployment-guides/replication-sharding-examples/replication.png';
11
12
import ConfigFileNote from '@site/docs /_ snippets/_ config-files.md';
12
13
import KeeperConfigFileNote from '@site/docs /_ snippets/_ keeper-config-files.md';
13
- import ReplicationArchitecture from '@site/static /images/deployment-guides/replication-sharding-examples/replication.png';
14
14
import ConfigExplanation from '@site/docs /deployment-guides/replication-sharding-examples/_ snippets/_ config_explanation.mdx';
15
15
import ListenHost from '@site/docs /deployment-guides/replication-sharding-examples/_ snippets/_ listen_host.mdx';
16
16
import ServerParameterTable from '@site/docs /deployment-guides/replication-sharding-examples/_ snippets/_ server_parameter_table.mdx';
17
17
import KeeperConfig from '@site/docs /deployment-guides/replication-sharding-examples/_ snippets/_ keeper_config.mdx';
18
18
import KeeperConfigExplanation from '@site/docs /deployment-guides/replication-sharding-examples/_ snippets/_ keeper_explanation.mdx';
19
19
import VerifyKeeperStatus from '@site/docs /deployment-guides/replication-sharding-examples/_ snippets/_ verify_keeper_using_mntr.mdx';
20
20
import DedicatedKeeperServers from '@site/docs /deployment-guides/replication-sharding-examples/_ snippets/_ dedicated_keeper_servers.mdx';
21
+ import ExampleFiles from '@site/docs /deployment-guides/replication-sharding-examples/_ snippets/_ working_example.mdx';
21
22
22
23
> In this example, you'll learn how to set up a simple ClickHouse cluster which
23
24
replicates the data. There are five servers configured. Two are used to host
24
25
copies of the data. The other three servers are used to coordinate the replication
25
- of data. With this example, we'll create a database and table that will be
26
- replicated across both data nodes using the ` ReplicatedMergeTree ` table engine.
26
+ of data.
27
27
28
28
The architecture of the cluster you will be setting up is shown below:
29
29
@@ -41,6 +41,8 @@ The architecture of the cluster you will be setting up is shown below:
41
41
42
42
## Set up directory structure and test environment {#set-up}
43
43
44
+ <ExampleFiles />
45
+
44
46
In this tutorial, you will use [ Docker compose] ( https://docs.docker.com/compose/ ) to
45
47
set up the ClickHouse cluster. This setup could be modified to work
46
48
for separate local machines, virtual machines or cloud instances as well.
@@ -533,11 +535,11 @@ SHOW DATABASES;
533
535
534
536
## Create a table on the cluster {#creating-a-table}
535
537
536
- Now that the database has been created, create a distributed table on the cluster.
538
+ Now that the database has been created, create a table on the cluster.
537
539
Run the following query from any of the host clients:
538
540
539
541
``` sql
540
- CREATE TABLE IF NOT EXISTS uk .uk_price_paid
542
+ CREATE TABLE IF NOT EXISTS uk .uk_price_paid_local
541
543
-- highlight-next-line
542
544
ON CLUSTER cluster_1S_2R
543
545
(
@@ -587,10 +589,13 @@ SHOW TABLES IN uk;
587
589
588
590
## Insert data {#inserting-data}
589
591
590
- Now insert data from ` clickhouse-01 ` :
592
+ As the data set is large and takes a few minutes to completely ingest, we will
593
+ insert only a small subset to begin with.
594
+
595
+ Insert a smaller subset of the data using the query below from ` clickhouse-01 ` :
591
596
592
597
``` sql
593
- INSERT INTO uk .uk_price_paid
598
+ INSERT INTO uk .uk_price_paid_local
594
599
SELECT
595
600
toUInt32(price_string) AS price,
596
601
parseDateTimeBestEffortUS(time ) AS date ,
@@ -626,18 +631,27 @@ FROM url(
626
631
d String,
627
632
e String'
628
633
) SETTINGS max_http_get_redirects= 10 ;
634
+ LIMIT 10000 ;
629
635
```
630
636
631
- Query the table from ` clickhouse-02 ` or ` clickhouse-01 ` :
637
+ Notice that the data is completely replicated on each host :
632
638
633
- ``` sql title="Query"
634
- SELECT count (* ) FROM uk .uk_price_paid ;
635
- ```
639
+ ``` sql
640
+ -- clickhouse-01
641
+ SELECT count (* )
642
+ FROM uk .uk_price_paid_local
636
643
637
- ``` response title="Response"
638
- ┌──count()─┐
639
- 1. │ 30212555 │ -- 30.21 million
640
- └──────────┘
644
+ -- ┌─count()─┐
645
+ -- 1.│ 10000 │
646
+ -- └─────────┘
647
+
648
+ -- clickhouse-02
649
+ SELECT count (* )
650
+ FROM uk .uk_price_paid_local
651
+
652
+ -- ┌─count()─┐
653
+ -- 1.│ 10000 │
654
+ -- └─────────┘
641
655
```
642
656
643
657
To demonstrate what happens when one of the hosts fails, create a simple test database
@@ -715,11 +729,67 @@ SELECT * FROM test.test_table
715
729
└────┴────────────────────┘
716
730
```
717
731
732
+ If at this stage you would like to ingest the full UK property price dataset
733
+ to play around with, you can run the following queries to do so:
734
+
735
+ ``` sql
736
+ TRUNCATE TABLE uk .uk_price_paid_local ON CLUSTER cluster_1S_2R;
737
+ INSERT INTO uk .uk_price_paid_local
738
+ SELECT
739
+ toUInt32(price_string) AS price,
740
+ parseDateTimeBestEffortUS(time ) AS date ,
741
+ splitByChar(' ' , postcode)[1 ] AS postcode1,
742
+ splitByChar(' ' , postcode)[2 ] AS postcode2,
743
+ transform(a, [' T' , ' S' , ' D' , ' F' , ' O' ], [' terraced' , ' semi-detached' , ' detached' , ' flat' , ' other' ]) AS type,
744
+ b = ' Y' AS is_new,
745
+ transform(c, [' F' , ' L' , ' U' ], [' freehold' , ' leasehold' , ' unknown' ]) AS duration,
746
+ addr1,
747
+ addr2,
748
+ street,
749
+ locality,
750
+ town,
751
+ district,
752
+ county
753
+ FROM url(
754
+ ' http://prod1.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv' ,
755
+ ' CSV' ,
756
+ ' uuid_string String,
757
+ price_string String,
758
+ time String,
759
+ postcode String,
760
+ a String,
761
+ b String,
762
+ c String,
763
+ addr1 String,
764
+ addr2 String,
765
+ street String,
766
+ locality String,
767
+ town String,
768
+ district String,
769
+ county String,
770
+ d String,
771
+ e String'
772
+ ) SETTINGS max_http_get_redirects= 10 ;
773
+ LIMIT 10000 ;
774
+ ```
775
+
776
+ Query the table from ` clickhouse-02 ` or ` clickhouse-01 ` :
777
+
778
+ ``` sql title="Query"
779
+ SELECT count (* ) FROM uk .uk_price_paid_local ;
780
+ ```
781
+
782
+ ``` response title="Response"
783
+ ┌──count()─┐
784
+ 1. │ 30212555 │ -- 30.21 million
785
+ └──────────┘
786
+ ```
787
+
718
788
</VerticalStepper >
719
789
720
790
## Conclusion {#conclusion}
721
791
722
- As you saw, the advantage of this cluster topology is that with two replicas,
792
+ The advantage of this cluster topology is that with two replicas,
723
793
your data exists on two separate hosts. If one host fails, the other replica
724
794
continues serving data without any loss. This eliminates single points of
725
795
failure at the storage level.
0 commit comments