Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,13 @@ build/
gradle-app.setting
!gradle-wrapper.jar

##############################
## Sbt
##############################
project/project/
project/target/
target/

##############################
## IntelliJ
##############################
Expand Down
6 changes: 6 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,9 @@ Does not require a running ClickHouse server.
./gradlew test
```

if you want to run scala unit tests

```bash
cd flink-connector-clickhouse
./gradlew clean runScalaTests
```
94 changes: 94 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
<div align="center">
<p><img src="https://github.com/ClickHouse/clickhouse-js/blob/a332672bfb70d54dfd27ae1f8f5169a6ffeea780/.static/logo.svg" width="200px" align="center"></p>
<h1>ClickHouse Flink Connector</h1>
</div>

Table of Contents
* [Covid Flink Example](#covid-flink-application-example)
* [Build Application](#build-covid-application)
* [Build Connector](#build-clickhouse-flink-connector)
* [Java Application](#java-covid-app)
* [Scala Application](#scala-covid-app)
* [Running Example](#running-the-example)
* [Download Data](#download-covid-data)
* [Create table](#create-a-destination-covid-table)
* [Submit Flink](#submit-flink-job)

# Covid Flink Application example

Read covid data from a file and insert into ClickHouse

### Build Covid Application

#### Build ClickHouse Flink Connector
If you wish to build the connector locally run before building the example
```bash
./gradlew publishToMavenLocal
```

#### Java Covid App

From the project directory, run the following command, which will create a `covid-1.0-SNAPSHOT.jar` artifact that can be found in your target folder

```bash
mvn clean package -DskipTests
```

#### Scala Covid App

From project directory run this will create a `covid.jar` can be found in `target/scala-2.12` folder

Build Covid Scala App

```bash
sbt clean assembly
```

## Running the example

- Prepare ClickHouse OSS or [ClickHouse Cloud](https://clickhouse.com/)
- Flink Cluster or Standalone running
- Download covid data

### Download covid data

Download covid data set and save it in a location that is accessible to Flink

```bash
curl -L -# -o epidemiology.csv https://storage.googleapis.com/covid19-open-data/v3/epidemiology.csv
```

### Create a destination covid table

```sql

CREATE TABLE IF NOT EXISTS `default`.`covid` (
date Date,
location_key LowCardinality(String),
new_confirmed Int32,
new_deceased Int32,
new_recovered Int32,
new_tested Int32,
cumulative_confirmed Int32,
cumulative_deceased Int32,
cumulative_recovered Int32,
cumulative_tested Int32
) ENGINE = MergeTree
ORDER BY (location_key, date);
```

### Submit Flink Job

With the Java `covid-1.0-SNAPSHOT.jar` or Scala `covid.jar` built, you can now submit the job to your Flink cluster (or standalone instance)

```bash
# Run the application
./bin/flink run \
/path/to/your/generated/jar \
-input "/path/to/epidemiology.csv" \
-url "/url/clickhouse" \
-username "default" \
-password "" \
-database "default" \
-table "covid"
```
225 changes: 225 additions & 0 deletions examples/maven/covid/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>com.clickhouse.example.covid</groupId>
<artifactId>covid</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>

<name>Flink Quickstart Job</name>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<flink.version>2.0.0</flink.version>
<target.java.version>11</target.java.version>
<scala.binary.version>2.12</scala.binary.version>
<maven.compiler.source>${target.java.version}</maven.compiler.source>
<maven.compiler.target>${target.java.version}</maven.compiler.target>
<log4j.version>2.24.1</log4j.version>
</properties>

<repositories>

<repository>
<id>apache.snapshots</id>
<name>Apache Development Snapshot Repository</name>
<url>https://repository.apache.org/content/repositories/snapshots/</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>

<dependencies>
<!-- Apache Flink dependencies -->
<!-- These dependencies are provided, because they should not be packaged into the JAR file. -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-files -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-files</artifactId>
<version>2.0.0</version>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>com.clickhouse.flink</groupId>
<artifactId>flink-connector-clickhouse-2.0.0</artifactId>
<version>0.0.1</version>
<classifier>all</classifier>
</dependency>


<!-- Add connector dependencies here. They must be in the default scope (compile). -->

<!-- Example:

<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka</artifactId>
<version>3.0.0-1.17</version>
</dependency>
-->

<!-- Add logging framework, to produce console output when running in the IDE. -->
<!-- These dependencies are excluded from the application JAR by default. -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>${log4j.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>${log4j.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${log4j.version}</version>
<scope>runtime</scope>
</dependency>
</dependencies>

<build>
<plugins>

<!-- Java Compiler -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>${target.java.version}</source>
<target>${target.java.version}</target>
</configuration>
</plugin>

<!-- We use the maven-shade plugin to create a fat jar that contains all necessary dependencies. -->
<!-- Change the value of <mainClass>...</mainClass> if your program entry point changes. -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.1.1</version>
<executions>
<!-- Run shade goal on package phase -->
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<createDependencyReducedPom>false</createDependencyReducedPom>
<artifactSet>
<excludes>
<exclude>org.apache.flink:flink-shaded-force-shading</exclude>
<exclude>com.google.code.findbugs:jsr305</exclude>
<exclude>org.slf4j:*</exclude>
<exclude>org.apache.logging.log4j:*</exclude>
</excludes>
</artifactSet>
<filters>
<filter>
<!-- Do not copy the signatures in the META-INF folder.
Otherwise, this might cause SecurityExceptions when using the JAR. -->
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.clickhouse.example.covid.DataStreamJob</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>

<pluginManagement>
<plugins>

<!-- This improves the out-of-the-box experience in Eclipse by resolving some warnings. -->
<plugin>
<groupId>org.eclipse.m2e</groupId>
<artifactId>lifecycle-mapping</artifactId>
<version>1.0.0</version>
<configuration>
<lifecycleMappingMetadata>
<pluginExecutions>
<pluginExecution>
<pluginExecutionFilter>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<versionRange>[3.1.1,)</versionRange>
<goals>
<goal>shade</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore/>
</action>
</pluginExecution>
<pluginExecution>
<pluginExecutionFilter>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<versionRange>[3.1,)</versionRange>
<goals>
<goal>testCompile</goal>
<goal>compile</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore/>
</action>
</pluginExecution>
</pluginExecutions>
</lifecycleMappingMetadata>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
Loading
Loading