Skip to content

Commit b351e33

Browse files
authored
docs: update docs and tuning guide related to native shuffle (#2487)
1 parent 6802b60 commit b351e33

File tree

2 files changed

+14
-7
lines changed

2 files changed

+14
-7
lines changed

docs/source/user-guide/latest/tuning.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -208,14 +208,14 @@ back to Spark for shuffle operations.
208208

209209
#### Native Shuffle
210210

211-
Comet provides a fully native shuffle implementation, which generally provides the best performance. However,
212-
native shuffle currently only supports `HashPartitioning` and `SinglePartitioning` and has some restrictions on
213-
supported data types.
211+
Comet provides a fully native shuffle implementation, which generally provides the best performance. Native shuffle
212+
supports `HashPartitioning`, `RangePartitioning` and `SinglePartitioning` but currently only supports primitive type
213+
partitioning keys. Columns that are not partitioning keys may contain complex types like maps, structs, and arrays.
214214

215215
#### Columnar (JVM) Shuffle
216216

217217
Comet Columnar shuffle is JVM-based and supports `HashPartitioning`, `RoundRobinPartitioning`, `RangePartitioning`, and
218-
`SinglePartitioning`. This shuffle implementation supports more data types than native shuffle.
218+
`SinglePartitioning`. This shuffle implementation supports complex data types as partitioning keys.
219219

220220
### Shuffle Compression
221221

spark/src/main/scala/org/apache/comet/rules/CometExecRule.scala

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -769,9 +769,9 @@ case class CometExecRule(session: SparkSession) extends Rule[SparkPlan] {
769769
/**
770770
* Determine which data types are supported as partition columns in native shuffle.
771771
*
772-
* For Hash Partition this defines the key that determines how data should be collocated for
773-
* operations like `groupByKey`, `reduceByKey` or `join`. Native code does not support hashing
774-
* complex types, see hash_funcs/utils.rs
772+
* For HashPartitioning this defines the key that determines how data should be collocated for
773+
* operations like `groupByKey`, `reduceByKey`, or `join`. Native code does not support
774+
* hashing complex types, see hash_funcs/utils.rs
775775
*/
776776
def supportedHashPartitioningDataType(dt: DataType): Boolean = dt match {
777777
case _: BooleanType | _: ByteType | _: ShortType | _: IntegerType | _: LongType |
@@ -782,6 +782,13 @@ case class CometExecRule(session: SparkSession) extends Rule[SparkPlan] {
782782
false
783783
}
784784

785+
/**
786+
* Determine which data types are supported as partition columns in native shuffle.
787+
*
788+
* For RangePartitioning this defines the key that determines how data should be collocated
789+
* for operations like `orderBy`, `repartitionByRange`. Native code does not support sorting
790+
* complex types.
791+
*/
785792
def supportedRangePartitioningDataType(dt: DataType): Boolean = dt match {
786793
case _: BooleanType | _: ByteType | _: ShortType | _: IntegerType | _: LongType |
787794
_: FloatType | _: DoubleType | _: StringType | _: BinaryType | _: TimestampType |

0 commit comments

Comments
 (0)