You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: 'Integrating Amazon Glue with ClickHouse and Spark'
8
8
---
9
9
10
+
import Image from '@theme/IdealImage';
10
11
import Tabs from '@theme/Tabs';
11
12
import TabItem from '@theme/TabItem';
13
+
import notebook_connections_config from '@site/static/images/integrations/data-ingestion/aws-glue/notebook-connections-config.png';
14
+
import dependent_jars_path_option from '@site/static/images/integrations/data-ingestion/aws-glue/dependent_jars_path_option.png';
12
15
13
-
# Integrating Amazon Glue with ClickHouse
16
+
# Integrating Amazon Glue with ClickHouse and Spark
14
17
15
18
[Amazon Glue](https://aws.amazon.com/glue/) is a fully managed, serverless data integration service provided by Amazon Web Services (AWS). It simplifies the process of discovering, preparing, and transforming data for analytics, machine learning, and application development.
16
19
17
-
Although there is no Glue ClickHouse connector available yet, the official JDBC connector can be leveraged to connect and integrate with ClickHouse:
20
+
## Installation {#installation}
21
+
22
+
To integrate your Glue code with ClickHouse, you can use our official Spark connector in Glue via one of the following:
23
+
- Installing the ClickHouse Glue connector from the AWS Marketplace (recommended).
24
+
- Manually adding the Spark Connector's jars to your Glue job.
Ensure your Glue job’s IAM role has the necessary permissions, as described in the minimum privileges [guide](https://docs.aws.amazon.com/glue/latest/dg/getting-started-min-privs-job.html#getting-started-min-privs-connectors).
34
+
35
+
3. <h3id="activate-the-connector">Activate the Connector & Create a Connection</h3>
36
+
You can activate the connector and create a connection directly by clicking [this link](https://console.aws.amazon.com/gluestudio/home#/connector/add-connection?connectorName="ClickHouse%20AWS%20Glue%20Connector"&connectorType="Spark"&connectorUrl=https://709825985650.dkr.ecr.us-east-1.amazonaws.com/clickhouse/clickhouse-glue:0.1&connectorClassName="com.clickhouse.spark.ClickHouseCatalog"), which opens the Glue connection creation page with key fields pre-filled. Give the connection a name, and press create (no need to provide the ClickHouse connection details at this stage).
37
+
38
+
4. <h3id="use-in-glue-job">Use in Glue Job</h3>
39
+
In your Glue job, select the `Job details` tab, and expend the `Advanced properties` window. Under the `Connections` section, select the connection you just created. The connector automatically injects the required JARs into the job runtime.
The JARs used in the Glue connector are built for `Spark 3.2`, `Scala 2`, and `Python 3`. Make sure to select these versions when configuring your Glue job.
|Amazon MSK|<Amazonmsksvgstyle={{width: '3rem'}}alt="Amazon MSK logo"/> |Data ingestion|Integration with Amazon Managed Streaming for Apache Kafka (MSK).|[Documentation](/integrations/kafka/cloud/amazon-msk/)|
207
207
|Amazon S3|<S3svgstyle={{width: '3rem', height: '3rem'}}alt="Amazon S3 logo"/>|Data ingestion|Import from, export to, and transform S3 data in flight with ClickHouse built-in S3 functions.|[Documentation](/integrations/data-ingestion/s3/index.md)|
208
-
|Amazon Glue|<Imageimg={glue_logo}size="logo"alt="Amazon Glue logo"/>|Data ingestion|Query ClickHouse over JDBC|[Documentation](/integrations/glue)|
208
+
|Amazon Glue|<Imageimg={glue_logo}size="logo"alt="Amazon Glue logo"/>|Data ingestion|Query ClickHouse over Spark using our official Glue connector|[Documentation](/integrations/glue)|
209
209
|Apache Spark|<Sparksvgalt="Amazon Spark logo"style={{width: '3rem'}}/>|Data ingestion|Spark ClickHouse Connector is a high performance connector built on top of Spark DataSource V2.|[GitHub](https://github.com/housepower/spark-clickhouse-connector),<br/>[Documentation](/integrations/data-ingestion/apache-spark/index.md)|
210
210
|Azure Event Hubs|<Azureeventhubssvgalt="Azure Events Hub logo"style={{width: '3rem'}}/>|Data ingestion|A data streaming platform that supports Apache Kafka's native protocol|[Website](https://azure.microsoft.com/en-gb/products/event-hubs)|
211
211
|Azure Synapse|<Imageimg={azure_synapse_logo}size="logo"alt="Azure Synapse logo"/>|Data ingestion|A cloud-based analytics service for big data and data warehousing.|[Documentation](/integrations/azure-synapse)|
0 commit comments