Skip to content

Commit 62ce73c

Browse files
committed
update the python code + docs
1 parent c554f01 commit 62ce73c

File tree

1 file changed

+28
-17
lines changed
  • docs/integrations/data-ingestion/aws-glue

1 file changed

+28
-17
lines changed

docs/integrations/data-ingestion/aws-glue/index.md

Lines changed: 28 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ To access the connector in your account, subscribe to the ClickHouse AWS Glue Co
3232
Ensure your Glue job’s IAM role has the necessary permissions, as described in the minimum privileges [guide](https://docs.aws.amazon.com/glue/latest/dg/getting-started-min-privs-job.html#getting-started-min-privs-connectors).
3333

3434
3. <h3 id="activate-the-connector">Activate the Connector & Create a Connection</h3>
35-
You can activate the connector and create a connection directly by clicking [this link](https://console.aws.amazon.com/gluestudio/home#/connector/add-connection?connectorName="ClickHouse%20AWS%20Glue%20Connector"&connectorType="Spark"&connectorUrl=https://709825985650.dkr.ecr.us-east-1.amazonaws.com/clickhouse/clickhouse-glue:0.1&connectorClassName="com.clickhouse.spark.ClickHouseCatalog"), which opens the Glue connection creation page with key fields pre-filled. Give the connection a name, and press create.
35+
You can activate the connector and create a connection directly by clicking [this link](https://console.aws.amazon.com/gluestudio/home#/connector/add-connection?connectorName="ClickHouse%20AWS%20Glue%20Connector"&connectorType="Spark"&connectorUrl=https://709825985650.dkr.ecr.us-east-1.amazonaws.com/clickhouse/clickhouse-glue:0.1&connectorClassName="com.clickhouse.spark.ClickHouseCatalog"), which opens the Glue connection creation page with key fields pre-filled. Give the connection a name, and press create (no need to provide the ClickHouse connection details at this stage).
3636

3737
4. <h3 id="use-in-glue-job">Use in Glue Job</h3>
3838
In your Glue job, select the `Job details` tab, and expend the `Advanced properties` window. Under the `Connections` section, select the connection you just created. The connector automatically injects the required JARs into the job runtime.
@@ -58,7 +58,7 @@ To add the required jars manually, please follow the following:
5858

5959
## Example {#example}
6060
<Tabs>
61-
<TabItem value="Java" label="Java" default>
61+
<TabItem value="Scala" label="Scala" default>
6262

6363
```java
6464
import com.amazonaws.services.glue.GlueContext
@@ -137,6 +137,8 @@ from awsglue.utils import getResolvedOptions
137137
from pyspark.context import SparkContext
138138
from awsglue.context import GlueContext
139139
from awsglue.job import Job
140+
from pyspark.sql import Row
141+
140142

141143
## @params: [JOB_NAME]
142144
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
@@ -147,20 +149,29 @@ logger = glueContext.get_logger()
147149
spark = glueContext.spark_session
148150
job = Job(glueContext)
149151
job.init(args['JOB_NAME'], args)
150-
jdbc_url = "jdbc:ch://{host}:{port}/{schema}"
151-
query = "select * from my_table"
152-
# For cloud usage, please add ssl options
153-
df = (spark.read.format("jdbc")
154-
.option("driver", 'com.clickhouse.jdbc.ClickHouseDriver')
155-
.option("url", jdbc_url)
156-
.option("user", 'default')
157-
.option("password", '*******')
158-
.option("query", query)
159-
.load())
160-
161-
logger.info("num of rows:")
162-
logger.info(str(df.count()))
163-
logger.info("Data sample:")
152+
153+
spark.conf.set("spark.sql.catalog.clickhouse", "com.clickhouse.spark.ClickHouseCatalog")
154+
spark.conf.set("spark.sql.catalog.clickhouse.host", "<your-clickhouse-host>")
155+
spark.conf.set("spark.sql.catalog.clickhouse.protocol", "https")
156+
spark.conf.set("spark.sql.catalog.clickhouse.http_port", "<your-clickhouse-port>")
157+
spark.conf.set("spark.sql.catalog.clickhouse.user", "default")
158+
spark.conf.set("spark.sql.catalog.clickhouse.password", "<your-password>")
159+
spark.conf.set("spark.sql.catalog.clickhouse.database", "default")
160+
spark.conf.set("spark.clickhouse.write.format", "json")
161+
spark.conf.set("spark.clickhouse.read.format", "arrow")
162+
# for ClickHouse cloud
163+
spark.conf.set("spark.sql.catalog.clickhouse.option.ssl", "true")
164+
spark.conf.set("spark.sql.catalog.clickhouse.option.ssl_mode", "NONE")
165+
166+
# Create DataFrame
167+
data = [Row(id=11, name="John"), Row(id=12, name="Doe")]
168+
df = spark.createDataFrame(data)
169+
170+
# Write DataFrame to ClickHouse
171+
df.writeTo("clickhouse.default.example_table").append()
172+
173+
# Read DataFrame from ClickHouse
174+
df_read = spark.sql("select * from clickhouse.default.example_table")
164175
logger.info(str(df.take(10)))
165176

166177

@@ -170,6 +181,6 @@ job.commit()
170181
</TabItem>
171182
</Tabs>
172183

173-
For more details, please visit our [Spark & JDBC documentation](/integrations/apache-spark/spark-jdbc#read-data).
184+
For more details, please visit our [Spark documentation](/integrations/apache-spark).
174185

175186

0 commit comments

Comments
 (0)