Improve clarity + typos (#3346)

BentsiLeviav · web-flow · commit 904ed8c1ac29 · 2025-02-25T15:59:37.000+02:00
diff --git a/docs/integrations/data-ingestion/apache-spark/index.md b/docs/integrations/data-ingestion/apache-spark/index.md
@@ -14,12 +14,12 @@ import TOCInline from '@theme/TOCInline';
 
 <br/>
 
-[Apache Spark](https://spark.apache.org/) Apache Spark™ is a multi-language engine for executing data engineering, data
+[Apache Spark](https://spark.apache.org/) is a multi-language engine for executing data engineering, data
 science, and machine learning on single-node machines or clusters.
 
 There are two main ways to connect Apache Spark and ClickHouse:
 
-1. [Spark Connector](./apache-spark/spark-native-connector) - the Spark connector implements the `DataSourceV2` and has its own Catalog
+1. [Spark Connector](./apache-spark/spark-native-connector) - The Spark connector implements the `DataSourceV2` and has its own Catalog
    management. As of today, this is the recommended way to integrate ClickHouse and Spark.
 2. [Spark JDBC](./apache-spark/spark-jdbc) - Integrate Spark and ClickHouse
    using a [JDBC data source](https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html).
diff --git a/docs/integrations/data-ingestion/apache-spark/spark-jdbc.md b/docs/integrations/data-ingestion/apache-spark/spark-jdbc.md
@@ -10,7 +10,7 @@ import TabItem from '@theme/TabItem';
 import TOCInline from '@theme/TOCInline';
 
 # Spark JDBC
-One of the most used data sources supported by Spark is JDBC.
+JDBC is one of the most commonly used data sources in Spark.
 In this section, we will provide details on how to
 use the [ClickHouse official JDBC connector](/integrations/java/jdbc-driver) with Spark.
 
@@ -209,7 +209,6 @@ df.show()
                 .option("dbtable", "example_table")
                 .option("user", "default")
                 .option("password", "123456")
-                .option("SaveMode", "append")
                 .save();
 
 
@@ -248,15 +247,15 @@ object WriteData extends App {
   )
   
   //---------------------------------------------------------------------------------------------------//---------------------------------------------------------------------------------------------------
-  // Write the df to ClickHouse using the jdbc method// Write the df to ClickHouse using the jdbc method
+  // Write the df to ClickHouse using the jdbc method
   //---------------------------------------------------------------------------------------------------//---------------------------------------------------------------------------------------------------
 
   df.write
     .mode(SaveMode.Append)
     .jdbc(jdbcUrl, "example_table", jdbcProperties)
 
   //---------------------------------------------------------------------------------------------------//---------------------------------------------------------------------------------------------------
-  // Write the df to ClickHouse using the save method// Write the df to ClickHouse using the save method
+  // Write the df to ClickHouse using the save method
   //---------------------------------------------------------------------------------------------------//---------------------------------------------------------------------------------------------------
 
   df.write
@@ -266,7 +265,6 @@ object WriteData extends App {
     .option("dbtable", "example_table")
     .option("user", "default")
     .option("password", "123456")
-    .option("SaveMode", "append")
     .save()
 
 
diff --git a/docs/integrations/data-ingestion/apache-spark/spark-native-connector.md b/docs/integrations/data-ingestion/apache-spark/spark-native-connector.md
@@ -23,7 +23,7 @@ With these external solutions, users had to register their data source tables ma
 However, since Spark 3.0 introduced the catalog concept, Spark can now automatically discover tables by registering
 catalog plugins.
 
-Spark default catalog is `spark_catalog`, and tables are identified by `{catalog name}.{database}.{table}`. With the new
+Spark's default catalog is `spark_catalog`, and tables are identified by `{catalog name}.{database}.{table}`. With the new
 catalog feature, it is now possible to add and work with multiple catalogs in a single Spark application.
 
 <TOCInline toc={toc}></TOCInline>
@@ -124,7 +124,7 @@ libraryDependencies += "com.clickhouse.spark" %% clickhouse-spark-runtime-{{ spa
 </TabItem>
 <TabItem value="Spark SQL/Shell CLI" label="Spark SQL/Shell CLI">
 
-When working with Spark's shell options (Spark SQL CLI, Spark Shell CLI, Spark Submit command), the dependencies can be
+When working with Spark's shell options (Spark SQL CLI, Spark Shell CLI, and Spark Submit command), the dependencies can be
 registered by passing the required jars:
 
 ```text
@@ -135,7 +135,7 @@ $SPARK_HOME/bin/spark-sql \
 If you want to avoid copying the JAR files to your Spark client node, you can use the following instead:
 
 ```text
-  --repositories https://{maven-cental-mirror or private-nexus-repo} \
+  --repositories https://{maven-central-mirror or private-nexus-repo} \
   --packages com.clickhouse.spark:clickhouse-spark-runtime-{{ spark_binary_version }}_{{ scala_binary_version }}:{{ stable_version }},com.clickhouse:clickhouse-jdbc:{{ clickhouse_jdbc_version }}:all
 ```
 
@@ -161,7 +161,7 @@ and all daily build SNAPSHOT JAR files in the [Sonatype OSS Snapshots Repository
 It's essential to include the [clickhouse-jdbc JAR](https://mvnrepository.com/artifact/com.clickhouse/clickhouse-jdbc)
 with the "all" classifier,
 as the connector relies on [clickhouse-http](https://mvnrepository.com/artifact/com.clickhouse/clickhouse-http-client)
-and [clickhouse-client](https://mvnrepository.com/artifact/com.clickhouse/clickhouse-client) —both of which are bundled
+and [clickhouse-client](https://mvnrepository.com/artifact/com.clickhouse/clickhouse-client) — both of which are bundled
 in clickhouse-jdbc:all.
 Alternatively, you can add [clickhouse-client JAR](https://mvnrepository.com/artifact/com.clickhouse/clickhouse-client)
 and [clickhouse-http](https://mvnrepository.com/artifact/com.clickhouse/clickhouse-http-client) individually if you
@@ -193,7 +193,7 @@ These settings could be set via one of the following:
 * Add the configuration when initiating your context.
 
 :::important
-When working with ClickHouse cluster, you need to set a unique catalog name for each instance.
+When working with a ClickHouse cluster, you need to set a unique catalog name for each instance.
 For example:
 
 ```text
@@ -498,13 +498,13 @@ The following are the adjustable configurations available in the connector:
 
 | Key                                                | Default                                                | Description                                                                                                                                                                                                                                                                                                                                                                                                     | Since |
 |----------------------------------------------------|--------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|
-| spark.clickhouse.ignoreUnsupportedTransform        | false                                                  | ClickHouse supports using complex expressions as sharding keys or partition values, e.g. `cityHash64(col_1, col_2)`, and those can not be supported by Spark now. If `true`, ignore the unsupported expressions, otherwise fail fast w/ an exception. Note, when `spark.clickhouse.write.distributed.convertLocal` is enabled, ignore unsupported sharding keys may corrupt the data.                           | 0.4.0 |
+| spark.clickhouse.ignoreUnsupportedTransform        | false                                                  | ClickHouse supports using complex expressions as sharding keys or partition values, e.g. `cityHash64(col_1, col_2)`, which are currently not supported by Spark. If `true`, ignore the unsupported expressions, otherwise fail fast w/ an exception. Note, when `spark.clickhouse.write.distributed.convertLocal` is enabled, ignore unsupported sharding keys may corrupt the data.                            | 0.4.0 |
 | spark.clickhouse.read.compression.codec            | lz4                                                    | The codec used to decompress data for reading. Supported codecs: none, lz4.                                                                                                                                                                                                                                                                                                                                     | 0.5.0 |
 | spark.clickhouse.read.distributed.convertLocal     | true                                                   | When reading Distributed table, read local table instead of itself. If `true`, ignore `spark.clickhouse.read.distributed.useClusterNodes`.                                                                                                                                                                                                                                                                      | 0.1.0 |
 | spark.clickhouse.read.fixedStringAs                | binary                                                 | Read ClickHouse FixedString type as the specified Spark data type. Supported types: binary, string                                                                                                                                                                                                                                                                                                              | 0.8.0 |
 | spark.clickhouse.read.format                       | json                                                   | Serialize format for reading. Supported formats: json, binary                                                                                                                                                                                                                                                                                                                                                   | 0.6.0 |
 | spark.clickhouse.read.runtimeFilter.enabled        | false                                                  | Enable runtime filter for reading.                                                                                                                                                                                                                                                                                                                                                                              | 0.8.0 |
-| spark.clickhouse.read.splitByPartitionId           | true                                                   | If `true`, construct input partition filter by virtual column `_partition_id`, instead of partition value. There are known bugs to assemble SQL predication by partition value. This feature requires ClickHouse Server v21.6+                                                                                                                                                                                  | 0.4.0 |
+| spark.clickhouse.read.splitByPartitionId           | true                                                   | If `true`, construct input partition filter by virtual column `_partition_id`, instead of partition value. There are known issues with assembling SQL predicates by partition value. This feature requires ClickHouse Server v21.6+                                                                                                                                                                             | 0.4.0 |
 | spark.clickhouse.useNullableQuerySchema            | false                                                  | If `true`, mark all the fields of the query schema as nullable when executing `CREATE/REPLACE TABLE ... AS SELECT ...` on creating the table. Note, this configuration requires SPARK-43390(available in Spark 3.5), w/o this patch, it always acts as `true`.                                                                                                                                                  | 0.8.0 |
 | spark.clickhouse.write.batchSize                   | 10000                                                  | The number of records per batch on writing to ClickHouse.                                                                                                                                                                                                                                                                                                                                                       | 0.1.0 |
 | spark.clickhouse.write.compression.codec           | lz4                                                    | The codec used to compress data for writing. Supported codecs: none, lz4.                                                                                                                                                                                                                                                                                                                                       | 0.3.0 |
@@ -520,7 +520,6 @@ The following are the adjustable configurations available in the connector:
 | spark.clickhouse.write.retryInterval               | 10s                                                    | The interval in seconds between write retry.                                                                                                                                                                                                                                                                                                                                                                    | 0.1.0 |
 | spark.clickhouse.write.retryableErrorCodes         | 241                                                    | The retryable error codes returned by ClickHouse server when write failing.                                                                                                                                                                                                                                                                                                                                     | 0.1.0 |
 
-
 ## Supported Data Types {#supported-data-types}
 
 This section outlines the mapping of data types between Spark and ClickHouse. The tables below provide quick references