Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
64c3dd7
save
andygrove Nov 21, 2025
c6fe639
save
andygrove Nov 21, 2025
2a8dc52
save
andygrove Nov 21, 2025
e4acf93
save
andygrove Nov 21, 2025
82a552d
save
andygrove Nov 21, 2025
ba93de7
save
andygrove Nov 21, 2025
96087ef
save [skip ci]
andygrove Nov 21, 2025
e6bb5cf
delete some tests
andygrove Nov 21, 2025
8c5a6be
test passes
andygrove Nov 21, 2025
42e0c79
prep for review
andygrove Nov 21, 2025
fac5c56
improve test
andygrove Nov 21, 2025
30a503f
prep for review
andygrove Nov 21, 2025
5174ba3
remove partition id from proto
andygrove Nov 21, 2025
8748d85
prep for review
andygrove Nov 21, 2025
f1b7ba8
move test
andygrove Nov 21, 2025
3ad4d6e
clippy
andygrove Nov 21, 2025
f35196b
code cleanup
andygrove Nov 21, 2025
1ea7f31
preserve column names
andygrove Nov 21, 2025
61b6690
fix assertion
andygrove Nov 21, 2025
1c00aae
test
andygrove Nov 21, 2025
e1e357e
improve test
andygrove Nov 21, 2025
9fb2b53
refactor to use operator serde framework
andygrove Nov 21, 2025
457a9b9
skip testing for native_datafusion for now
andygrove Nov 21, 2025
757ba81
remove hard-coded compression codec
andygrove Nov 21, 2025
7b9e925
lz4 level
andygrove Nov 21, 2025
913f2ba
remove partition id from proto
andygrove Nov 21, 2025
de07d45
implement getSupportLevel
andygrove Nov 22, 2025
eb4fee4
move config to testing category
andygrove Nov 22, 2025
9a6e48c
fuzz test
andygrove Nov 22, 2025
8b1cb0b
remove snappy from filename
andygrove Nov 22, 2025
58c056d
remove snappy from filename
andygrove Nov 22, 2025
bd04274
docs
andygrove Nov 22, 2025
55d5d48
fix ci
andygrove Nov 22, 2025
cda8322
Revert "fix ci"
andygrove Nov 22, 2025
6a4726b
Merge remote-tracking branch 'apache/main' into parquet-write-poc
andygrove Nov 24, 2025
1d1d430
partially address feedback
andygrove Nov 24, 2025
ece289a
Update spark/src/main/scala/org/apache/comet/serde/operator/CometData…
andygrove Nov 25, 2025
b7daa61
format
andygrove Nov 26, 2025
44991b2
address more feedback
andygrove Nov 26, 2025
8d9b41a
cargo fmt
andygrove Nov 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/pr_build_linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ jobs:
org.apache.comet.exec.DisableAQECometAsyncShuffleSuite
- name: "parquet"
value: |
org.apache.comet.parquet.CometParquetWriterSuite
org.apache.comet.parquet.ParquetReadV1Suite
org.apache.comet.parquet.ParquetReadV2Suite
org.apache.comet.parquet.ParquetReadFromFakeHadoopFsSuite
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/pr_build_macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ jobs:
org.apache.comet.exec.DisableAQECometAsyncShuffleSuite
- name: "parquet"
value: |
org.apache.comet.parquet.CometParquetWriterSuite
org.apache.comet.parquet.ParquetReadV1Suite
org.apache.comet.parquet.ParquetReadV2Suite
org.apache.comet.parquet.ParquetReadFromFakeHadoopFsSuite
Expand Down
11 changes: 11 additions & 0 deletions common/src/main/scala/org/apache/comet/CometConf.scala
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,17 @@ object CometConf extends ShimCometConf {
.booleanConf
.createWithDefault(true)

val COMET_NATIVE_PARQUET_WRITE_ENABLED: ConfigEntry[Boolean] =
conf("spark.comet.parquet.write.enabled")
.category(CATEGORY_TESTING)
.doc(
"Whether to enable native Parquet write through Comet. When enabled, " +
"Comet will intercept Parquet write operations and execute them natively. This " +
"feature is highly experimental and only partially implemented. It should not " +
"be used in production.")
.booleanConf
.createWithDefault(false)

val SCAN_NATIVE_COMET = "native_comet"
val SCAN_NATIVE_DATAFUSION = "native_datafusion"
val SCAN_NATIVE_ICEBERG_COMPAT = "native_iceberg_compat"
Expand Down
1 change: 1 addition & 0 deletions docs/source/user-guide/latest/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ These settings can be used to determine which parts of the plan are accelerated
| `spark.comet.exec.onHeap.enabled` | Whether to allow Comet to run in on-heap mode. Required for running Spark SQL tests. It can be overridden by the environment variable `ENABLE_COMET_ONHEAP`. | false |
| `spark.comet.exec.onHeap.memoryPool` | The type of memory pool to be used for Comet native execution when running Spark in on-heap mode. Available pool types are `greedy`, `fair_spill`, `greedy_task_shared`, `fair_spill_task_shared`, `greedy_global`, `fair_spill_global`, and `unbounded`. | greedy_task_shared |
| `spark.comet.memoryOverhead` | The amount of additional memory to be allocated per executor process for Comet, in MiB, when running Spark in on-heap mode. | 1024 MiB |
| `spark.comet.parquet.write.enabled` | Whether to enable native Parquet write through Comet. When enabled, Comet will intercept Parquet write operations and execute them natively. This feature is highly experimental and only partially implemented. It should not be used in production. | false |
| `spark.comet.sparkToColumnar.enabled` | Whether to enable Spark to Arrow columnar conversion. When this is turned on, Comet will convert operators in `spark.comet.sparkToColumnar.supportedOperatorList` into Arrow columnar format before processing. This is an experimental feature and has known issues with non-UTC timezones. | false |
| `spark.comet.sparkToColumnar.supportedOperatorList` | A comma-separated list of operators that will be converted to Arrow columnar format when `spark.comet.sparkToColumnar.enabled` is true. | Range,InMemoryTableScan,RDDScan |
| `spark.comet.testing.strict` | Experimental option to enable strict testing, which will fail tests that could be more comprehensive, such as checking for a specific fallback reason. It can be overridden by the environment variable `ENABLE_COMET_STRICT_TESTING`. | false |
Expand Down
41 changes: 21 additions & 20 deletions docs/source/user-guide/latest/operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,25 +22,26 @@
The following Spark operators are currently replaced with native versions. Query stages that contain any operators
not supported by Comet will fall back to regular Spark execution.

| Operator | Spark-Compatible? | Compatibility Notes |
| ----------------------- | ----------------- | ------------------------------------------------------------------------------------------------------------------ |
| BatchScanExec | Yes | Supports Parquet files and Apache Iceberg Parquet scans. See the [Comet Compatibility Guide] for more information. |
| BroadcastExchangeExec | Yes | |
| BroadcastHashJoinExec | Yes | |
| ExpandExec | Yes | |
| FileSourceScanExec | Yes | Supports Parquet files. See the [Comet Compatibility Guide] for more information. |
| FilterExec | Yes | |
| GlobalLimitExec | Yes | |
| HashAggregateExec | Yes | |
| LocalLimitExec | Yes | |
| LocalTableScanExec | No | Experimental and disabled by default. |
| ObjectHashAggregateExec | Yes | Supports a limited number of aggregates, such as `bloom_filter_agg`. |
| ProjectExec | Yes | |
| ShuffleExchangeExec | Yes | |
| ShuffledHashJoinExec | Yes | |
| SortExec | Yes | |
| SortMergeJoinExec | Yes | |
| UnionExec | Yes | |
| WindowExec | No | Disabled by default due to known correctness issues. |
| Operator | Spark-Compatible? | Compatibility Notes |
| --------------------------------- | ----------------- | ------------------------------------------------------------------------------------------------------------------ |
| BatchScanExec | Yes | Supports Parquet files and Apache Iceberg Parquet scans. See the [Comet Compatibility Guide] for more information. |
| BroadcastExchangeExec | Yes | |
| BroadcastHashJoinExec | Yes | |
| ExpandExec | Yes | |
| FileSourceScanExec | Yes | Supports Parquet files. See the [Comet Compatibility Guide] for more information. |
| FilterExec | Yes | |
| GlobalLimitExec | Yes | |
| HashAggregateExec | Yes | |
| InsertIntoHadoopFsRelationCommand | No | Experimental support for native Parquet writes. Disabled by default. |
| LocalLimitExec | Yes | |
| LocalTableScanExec | No | Experimental and disabled by default. |
| ObjectHashAggregateExec | Yes | Supports a limited number of aggregates, such as `bloom_filter_agg`. |
| ProjectExec | Yes | |
| ShuffleExchangeExec | Yes | |
| ShuffledHashJoinExec | Yes | |
| SortExec | Yes | |
| SortMergeJoinExec | Yes | |
| UnionExec | Yes | |
| WindowExec | No | Disabled by default due to known correctness issues. |

[Comet Compatibility Guide]: compatibility.md
2 changes: 2 additions & 0 deletions native/core/src/execution/operators/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ mod copy;
mod expand;
pub use expand::ExpandExec;
mod iceberg_scan;
mod parquet_writer;
pub use parquet_writer::ParquetWriterExec;
mod scan;

/// Error returned during executing operators.
Expand Down
Loading
Loading