You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/blog/clickhouse-benchmarking.md
+18-12Lines changed: 18 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,11 @@
1
-
# ClickHouse Deployment and Performance Benchmarking on ECS
1
+
---
2
+
title: "ClickHouse Deployment and Performance Benchmarking on ECS"
3
+
authorId: "rohit"
4
+
date: 2024-10-21
5
+
draft: false
6
+
featured: true
7
+
weight: 1
8
+
---
2
9
3
10
"Imagine being a Formula One driver, racing at breakneck speeds, but without any telemetry data to guide you. It’s a thrilling ride, but one wrong turn or overheating engine could lead to disaster. Just like a pit crew relies on performance metrics to optimize the car's speed and handling, we utilize observability in ClickHouse to monitor our data system's health. These metrics provide crucial insights, allowing us to identify bottlenecks, prevent outages, and fine-tune performance, ensuring our data engine runs as smoothly and efficiently as a championship-winning race car."
4
11
@@ -12,17 +19,17 @@ In this blog, we'll dive into the process of deploying ClickHouse on AWS Elastic
12
19
13
20
In this architecture, we utilize five servers to ensure data availability and reliability. Two of these servers are dedicated to hosting copies of the data, while the remaining three serve to coordinate the replication process. We will create a database and a table using the **ReplicatedMergeTree** engine, which allows for seamless data replication across the two data nodes.
14
21
15
-
####Key Terms:
22
+
### Key Terms
16
23
17
24
-**Replica:** In ClickHouse, a replica refers to a copy of your data. There is always at least one copy (the original), and adding a second replica enhances fault tolerance. This ensures that if one copy fails, the other remains accessible.
18
25
19
26
-**Shard:** A shard is a subset of your data. If you do not split the data across multiple servers, all data resides in a single shard. Sharding helps distribute the load when a single server's capacity is exceeded. The destination server for the data is determined by a sharding key, which can be random or derived from a hash function. In our examples, we will use a random key for simplicity.
20
27
21
28
This architecture not only protects your data but also allows for better handling of increased loads, making it a robust solution for data management in ClickHouse. For more detailed information, refer to the official documentation on [ClickHouse Replication Architecture](https://clickhouse.com/docs/en/architecture/replication).
22
29
23
-
###Configuration Changes for ClickHouse Deployment
30
+
## Configuration Changes for ClickHouse Deployment
24
31
25
-
**Node Descriptions**
32
+
### Node Descriptions
26
33
27
34
-**clickhouse-01**: Data node for storing data.
28
35
-**clickhouse-02**: Another data node for data storage.
@@ -32,7 +39,7 @@ This architecture not only protects your data but also allows for better handlin
32
39
33
40
### Installation Steps
34
41
35
-
-**ClickHouse Server**: We deployed ClickHouse Server and Client on the data nodes, clickhouse-01 and clickhouse-02, using Docker images, specifically `clickhouse/clickhouse-server` for installation.
42
+
-**ClickHouse Server**: We deployed ClickHouse Server and Client on the data nodes, clickhouse-01 and clickhouse-02, using Docker images, specifically `clickhouse/clickhouse-server` for installation.
36
43
37
44
-**ClickHouse Keeper**: Installed on the three servers (clickhouse-keeper-01, clickhouse-keeper-02, and clickhouse-keeper-03) using Docker image `clickhouse/clickhouse-keeper`.
38
45
@@ -48,7 +55,7 @@ This architecture not only protects your data but also allows for better handlin
48
55
49
56
The configuration for clickhouse-01 includes five files for clarity, although they can be combined if desired. Here are key elements:
50
57
51
-
-**Network and Logging Configuration**:
58
+
-**Network and Logging Configuration**:
52
59
- Sets the display name to "cluster_1S_2R node 1."
53
60
- Configures ports for HTTP (8123) and TCP (9000).
54
61
@@ -68,7 +75,7 @@ The configuration for clickhouse-01 includes five files for clarity, although th
68
75
</clickhouse>
69
76
```
70
77
71
-
-**Macros Configuration**:
78
+
-**Macros Configuration**:
72
79
- Simplifies DDL by using macros for shard and replica numbers.
73
80
74
81
```xml
@@ -131,17 +138,17 @@ The configuration for clickhouse-01 includes five files for clarity, although th
131
138
132
139
The configuration is mostly similar to clickhouse-01, with key differences noted:
133
140
134
-
-**Network and Logging Configuration**:
141
+
-**Network and Logging Configuration**:
135
142
- Similar to clickhouse-01 but with a different display name.
136
143
137
-
-**Macros Configuration**:
144
+
-**Macros Configuration**:
138
145
- The replica is set to 02 on this node.
139
146
140
147
#### clickhouse-keeper Configuration
141
148
142
149
For ClickHouse Keeper, each node configuration includes:
143
150
144
-
-**General Configuration**:
151
+
-**General Configuration**:
145
152
- Ensure the server ID is unique across all instances.
146
153
147
154
```xml
@@ -173,7 +180,7 @@ All configuration changes were integrated into the Docker image using the base C
173
180
174
181
## ECS Cluster Setup and ClickHouse Deployment
175
182
176
-
####ClickHouse Deployment Overview
183
+
### ClickHouse Deployment Overview
177
184
178
185
-**AWS Partner Solution:** We leveraged the AWS Partner Solution Deployment Guide for ClickHouse to ensure a structured setup.
179
186
@@ -205,7 +212,6 @@ All configuration changes were integrated into the Docker image using the base C
205
212
206
213
-**Auto Scaling Group:** An Auto Scaling Group was set up with `m5.large` instances, providing **3 GB of memory** for each container to ensure optimal performance under varying loads.
207
214
208
-
209
215
## Performance Benchmarking Metrics
210
216
211
217
Now that the deployment architecture is established, let's move on to evaluating ClickHouse's performance through a series of benchmarking metrics.
0 commit comments