|
1 | | -# Flink ClickHouse Connector Example |
2 | | - |
3 | | -A comprehensive example demonstrating how to use Apache Flink with ClickHouse to process and store COVID-19 epidemiological data. |
4 | | - |
5 | | -## 📋 Prerequisites |
6 | | - |
7 | | -- **Java 11 or higher** |
8 | | -- **Apache Maven 3.6+** or **Gradle 7.0+** |
9 | | -- **ClickHouse instance** (local or cloud) |
10 | | -- **Internet connection** for downloads |
11 | | -- **Operating System**: Linux, macOS, or Windows with WSL |
12 | | - |
13 | | -## 🚀 Quick Start |
14 | | - |
15 | | -### 1. Download Apache Flink |
16 | | - |
17 | | -```bash |
18 | | -# Download Flink 2.0.0 |
19 | | -wget https://dlcdn.apache.org/flink/flink-2.0.0/flink-2.0.0-bin-scala_2.12.tgz |
20 | | - |
21 | | -# Alternative with curl (shows progress) |
22 | | -curl -L -# -O https://dlcdn.apache.org/flink/flink-2.0.0/flink-2.0.0-bin-scala_2.12.tgz |
23 | | - |
24 | | -# Verify download (optional) |
25 | | -sha512sum flink-2.0.0-bin-scala_2.12.tgz |
26 | | -``` |
27 | | - |
28 | | -### 2. Extract and Start Flink |
29 | | - |
30 | | -```bash |
31 | | -# Extract Flink |
32 | | -tar -xzf flink-2.0.0-bin-scala_2.12.tgz |
33 | | -cd flink-2.0.0 |
34 | | - |
35 | | -# Start Flink cluster |
36 | | -./bin/start-cluster.sh |
37 | | - |
38 | | -# Verify Flink is running |
39 | | -./bin/flink list |
40 | | -# Or check web UI: http://localhost:8081 |
41 | | -``` |
42 | | - |
43 | | -### 3. Build the Connector and Application |
44 | | - |
45 | | -```bash |
46 | | -# Build connector (run from connector root directory) |
47 | | -./gradlew publishToMavenLocal |
48 | | - |
49 | | -# Verify connector was published |
50 | | -ls ~/.m2/repository/org/apache/flink/connector/clickhouse/ |
51 | | - |
52 | | -# Build the application (run from maven example folder) |
53 | | -cd examples/maven # adjust path as needed |
54 | | -mvn clean package -DskipTests |
55 | | - |
56 | | -# Verify JAR was created |
57 | | -ls target/covid-1.0-SNAPSHOT.jar |
58 | | -``` |
59 | | - |
60 | | -## 🗄️ ClickHouse Setup |
61 | | - |
62 | | -### Option A: Docker (Recommended for testing) |
63 | | - |
64 | | -```bash |
65 | | -# Start ClickHouse with Docker |
66 | | -docker run -d --name clickhouse-server \ |
67 | | - -p 8123:8123 -p 9000:9000 \ |
68 | | - --ulimit nofile=262144:262144 \ |
69 | | - clickhouse/clickhouse-server |
70 | | - |
71 | | -# Wait for startup |
72 | | -sleep 10 |
73 | | - |
74 | | -# Test connection |
75 | | -curl http://localhost:8123/ping |
76 | | -``` |
77 | | - |
78 | | -### Option B: ClickHouse Cloud |
79 | | - |
80 | | -1. Go to [ClickHouse Cloud](https://clickhouse.com/) |
81 | | -2. Create a new service |
82 | | -3. Note down your connection details |
83 | | - |
84 | | -### Create Database Table |
85 | | - |
86 | | -```sql |
87 | | --- Connect to ClickHouse |
88 | | -clickhouse-client # for local install |
89 | | --- or use web interface: http://localhost:8123/play |
90 | | - |
91 | | --- Create table |
92 | | -CREATE TABLE IF NOT EXISTS `default`.`covid` ( |
93 | | - date Date, |
94 | | - location_key LowCardinality(String), |
95 | | - new_confirmed Int32, |
96 | | - new_deceased Int32, |
97 | | - new_recovered Int32, |
98 | | - new_tested Int32, |
99 | | - cumulative_confirmed Int32, |
100 | | - cumulative_deceased Int32, |
101 | | - cumulative_recovered Int32, |
102 | | - cumulative_tested Int32 |
103 | | -) ENGINE = MergeTree |
104 | | -ORDER BY (location_key, date); |
105 | | -``` |
106 | | - |
107 | | -## 📊 Download Sample Data |
108 | | - |
109 | | -```bash |
110 | | -# Download COVID-19 epidemiological data |
111 | | -wget https://storage.googleapis.com/covid19-open-data/v3/epidemiology.csv |
112 | | - |
113 | | -# Alternative with curl |
114 | | -curl -L -# -o epidemiology.csv https://storage.googleapis.com/covid19-open-data/v3/epidemiology.csv |
115 | | - |
116 | | -# Check file size and first few lines |
117 | | -ls -lh epidemiology.csv |
118 | | -head -5 epidemiology.csv |
119 | | -``` |
120 | | - |
121 | | -## ▶️ Run the Application |
122 | | - |
123 | | -### Local ClickHouse |
124 | | - |
125 | | -```bash |
126 | | -# Navigate to Flink directory |
127 | | -cd flink-2.0.0 |
128 | | - |
129 | | -# Run the application |
130 | | -./bin/flink run \ |
131 | | - /path/to/your/covid-1.0-SNAPSHOT.jar \ |
132 | | - -input "/path/to/epidemiology.csv" \ |
133 | | - -url "http://localhost:8123/default" \ |
134 | | - -username "default" \ |
135 | | - -password "" \ |
136 | | - -database "default" \ |
137 | | - -table "covid" |
138 | | -``` |
139 | | - |
140 | | -### ClickHouse Cloud |
141 | | - |
142 | | -```bash |
143 | | -./bin/flink run \ |
144 | | - /path/to/your/covid-1.0-SNAPSHOT.jar \ |
145 | | - -input "/path/to/epidemiology.csv" \ |
146 | | - -url "jdbc:clickhouse://your-cluster.clickhouse.cloud:8443/default?ssl=true" \ |
147 | | - -username "your-username" \ |
148 | | - -password "your-password" \ |
149 | | - -database "default" \ |
150 | | - -table "covid" |
151 | | -``` |
152 | | - |
153 | | -## ✅ Verify Results |
154 | | - |
155 | | -```sql |
156 | | --- Check data was inserted |
157 | | -SELECT COUNT(*) FROM covid; |
158 | | - |
159 | | --- View sample data |
160 | | -SELECT * FROM covid LIMIT 10; |
161 | | - |
162 | | --- Check by country |
163 | | -SELECT location_key, COUNT(*) as records |
164 | | -FROM covid |
165 | | -GROUP BY location_key |
166 | | -ORDER BY records DESC |
167 | | -LIMIT 10; |
168 | | - |
169 | | --- Analyze data trends |
170 | | -SELECT |
171 | | - date, |
172 | | - SUM(new_confirmed) as global_new_cases, |
173 | | - SUM(cumulative_confirmed) as global_total_cases |
174 | | -FROM covid |
175 | | -WHERE date >= '2020-01-01' |
176 | | -GROUP BY date |
177 | | -ORDER BY date |
178 | | -LIMIT 20; |
179 | | -``` |
180 | | - |
181 | | -## 🔧 Configuration Options |
182 | | - |
183 | | -### Application Parameters |
184 | | - |
185 | | -| Parameter | Description | Required | Default | |
186 | | -|-----------|-------------|----------|---------| |
187 | | -| `-input` | Path to input CSV file | Yes | - | |
188 | | -| `-url` | ClickHouse URL | Yes | - | |
189 | | -| `-username` | ClickHouse username | Yes | - | |
190 | | -| `-password` | ClickHouse password | No | "" | |
191 | | -| `-database` | Target database name | Yes | - | |
192 | | -| `-table` | Target table name | Yes | - | |
193 | | - |
| 1 | +# Apache Flink Connector ClickHouse Example App |
194 | 2 |
|
0 commit comments