Skip to content

Commit 3ff0f96

Browse files
authored
docs: Add Spark 4.0, StructType, and VariantType support to connector… (#4798)
* docs: Add Spark 4.0, StructType, and VariantType support to connector documentation - Add Spark 4.0 support with Java 17+ requirement and Scala 2.13 only - Update compatibility matrix to include version 0.9.0 with Spark 4.0 - Add StructType (Tuple) bidirectional mapping support - Support for named and unnamed tuples - Nested structs and nullable fields - Add VariantType (JSON) support for Spark 4.0+ - New spark.clickhouse.read.jsonAs configuration option - Default to VariantType, optional StringType mode - Update Boolean type mapping from UInt8 to Bool (version 0.9.0) - Update ClickHouse JDBC version to 0.9.4 for main and 0.9.0 * docs: Add note that VariantType does not support Arrow write format VariantType (JSON) requires JSON write format. Arrow format is not supported for this data type. * docs: Update VariantType to show partial support VariantType currently only supports objects, not primitives. Changed status to partial support (⚠️) to reflect this limitation. * docs: Remove VariantType support from Spark connector documentation - Remove spark.clickhouse.read.jsonAs configuration option - Update JSON type to map to StringType only (consolidated with other string types) - Mark VariantType as not supported in write operations - VariantType support has been removed from the connector
1 parent 35398fe commit 3ff0f96

File tree

1 file changed

+10
-8
lines changed

1 file changed

+10
-8
lines changed

docs/integrations/data-ingestion/apache-spark/spark-native-connector.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -32,17 +32,17 @@ catalog feature, it is now possible to add and work with multiple catalogs in a
3232

3333
## Requirements {#requirements}
3434

35-
- Java 8 or 17
36-
- Scala 2.12 or 2.13
37-
- Apache Spark 3.3 or 3.4 or 3.5
35+
- Java 8 or 17 (Java 17+ required for Spark 4.0)
36+
- Scala 2.12 or 2.13 (Spark 4.0 only supports Scala 2.13)
37+
- Apache Spark 3.3, 3.4, 3.5, or 4.0
3838

3939
## Compatibility matrix {#compatibility-matrix}
4040

4141
| Version | Compatible Spark Versions | ClickHouse JDBC version |
4242
|---------|---------------------------|-------------------------|
43-
| main | Spark 3.3, 3.4, 3.5 | 0.6.3 |
43+
| main | Spark 3.3, 3.4, 3.5, 4.0 | 0.9.4 |
44+
| 0.9.0 | Spark 3.3, 3.4, 3.5, 4.0 | 0.9.4 |
4445
| 0.8.1 | Spark 3.3, 3.4, 3.5 | 0.6.3 |
45-
| 0.8.0 | Spark 3.3, 3.4, 3.5 | 0.6.3 |
4646
| 0.7.3 | Spark 3.3, 3.4 | 0.4.6 |
4747
| 0.6.0 | Spark 3.3 | 0.3.2-patch11 |
4848
| 0.5.0 | Spark 3.2, 3.3 | 0.3.2-patch11 |
@@ -544,7 +544,7 @@ for converting data types when reading from ClickHouse into Spark and when inser
544544
| ClickHouse Data Type | Spark Data Type | Supported | Is Primitive | Notes |
545545
|-------------------------------------------------------------------|--------------------------------|-----------|--------------|----------------------------------------------------|
546546
| `Nothing` | `NullType` || Yes | |
547-
| `Bool` | `BooleanType` || Yes | |
547+
| `Bool` | `BooleanType` || Yes | |
548548
| `UInt8`, `Int16` | `ShortType` || Yes | |
549549
| `Int8` | `ByteType` || Yes | |
550550
| `UInt16`,`Int32` | `IntegerType` || Yes | |
@@ -567,7 +567,7 @@ for converting data types when reading from ClickHouse into Spark and when inser
567567
| `IntervalDay`, `IntervalHour`, `IntervalMinute`, `IntervalSecond` | `DayTimeIntervalType` || No | Specific interval type is used |
568568
| `Object` | || | |
569569
| `Nested` | || | |
570-
| `Tuple` | | | | |
570+
| `Tuple` | `StructType` | | No | Supports both named and unnamed tuples. Named tuples map to struct fields by name, unnamed tuples use `_1`, `_2`, etc. Supports nested structs and nullable fields |
571571
| `Point` | || | |
572572
| `Polygon` | || | |
573573
| `MultiPolygon` | || | |
@@ -582,7 +582,7 @@ for converting data types when reading from ClickHouse into Spark and when inser
582582

583583
| Spark Data Type | ClickHouse Data Type | Supported | Is Primitive | Notes |
584584
|-------------------------------------|----------------------|-----------|--------------|----------------------------------------|
585-
| `BooleanType` | `UInt8` || Yes | |
585+
| `BooleanType` | `Bool` || Yes | Mapped to `Bool` type (not `UInt8`) since version 0.9.0 |
586586
| `ByteType` | `Int8` || Yes | |
587587
| `ShortType` | `Int16` || Yes | |
588588
| `IntegerType` | `Int32` || Yes | |
@@ -597,6 +597,8 @@ for converting data types when reading from ClickHouse into Spark and when inser
597597
| `TimestampType` | `DateTime` || Yes | |
598598
| `ArrayType` (list, tuple, or array) | `Array` || No | Array element type is also converted |
599599
| `MapType` | `Map` || No | Keys are limited to `StringType` |
600+
| `StructType` | `Tuple` || No | Converted to named Tuple with field names. |
601+
| `VariantType` | `VariantType` || No | |
600602
| `Object` | || | |
601603
| `Nested` | || | |
602604

0 commit comments

Comments
 (0)