0.11.0
Pre-release
Pre-release
DataFusion Comet 0.11.0 Changelog
This release consists of 131 commits from 15 contributors. See credits at the end of this changelog for more information.
Fixed bugs:
- fix: temporarily ignore test for hdfs file systems #2359 (parthchandra)
- fix: Check reused broadcast plan in non-AQE and make setNumPartitions thread safe #2398 (wForget)
- fix: correct
missingInputforCometHashAggregateExec#2409 (comphead) - fix:clippy errros rust 1.9.0 update #2419 (coderfender)
- fix: Avoid spark plan execution cache preventing CometBatchRDD numPartitions change #2420 (wForget)
- fix: regressions in
CometToPrettyStringSuite#2384 (hsiang-c) - fix: Byte array Literals failed on cast #2432 (comphead)
- fix: Do not push down subquery filters on native_datafusion scan #2438 (wForget)
- fix: Improve error handling when resolving S3 bucket region #2440 (andygrove)
- fix: [iceberg] additional parquet independent api for iceberg integration #2442 (parthchandra)
- fix: Specify reqwest crate features #2446 (andygrove)
- fix: distributed RangePartitioning bounds calculation with native shuffle #2258 (mbutrovich)
- fix: fix regression in tpcbench.py #2512 (andygrove)
- fix: [iceberg] Close reader instance in ReadConf #2510 (hsiang-c)
- fix: Enable plan stability tests for
autoscan #2516 (andygrove) - fix: Capture unexpected output when retrieving JVM 17 args in Makefile #2566 (zuston)
Performance related:
- perf: New Configuration from shared conf to avoid high costs #2402 (wForget)
- perf: Use DataFusion's
count_udafinstead ofSUM(IF(expr IS NOT NULL, 1, 0))#2407 (andygrove) - perf: Improve BroadcastExchangeExec conversion #2417 (wForget)
Implemented enhancements:
- feat: Add dynamic
enabledandallowIncompatconfigs for all supported expressions #2329 (andygrove) - feat: feature specific tests #2372 (parthchandra)
- feat: Support more date part expressions #2316 (wForget)
- feat: rpad support column for second arg instead of just literal #2099 (coderfender)
- feat: Support comet native log level conf #2379 (wForget)
- feat: Enable WeekDay function #2411 (wForget)
- feat: Add nested Array literal support #2181 (comphead)
- feat:add_additional_char_support_rpad #2436 (coderfender)
- feat: do not fallback to Spark for
COUNT(distinct)#2429 (comphead) - feat: implement_ansi_eval_mode_arithmetic #2136 (coderfender)
- feat: Add plan conversion statistics to extended explain info #2412 (andygrove)
- feat: implement_comet_native_lpad_expr #2102 (coderfender)
- feat: Add
backtracefeature to simplify enabling native backtraces inCometNativeException#2515 (andygrove) - feat: Support reverse function with ArrayType input #2481 (cfmcgrady)
- feat: Change default off-heap memory pool from
greedy_unifiedtofair_unified#2526 (andygrove) - feat: Make DiskManager
max_temp_directory_sizeconfigurable #2479 (manuzhang) - feat: Parquet Modular Encryption with Spark KMS for native readers #2447 (mbutrovich)
- feat: Add support for Spark-compatible cast from integral to decimal #2472 (coderfender)
- feat:Support ANSI mode integral divide #2421 (coderfender)
- feat: Add config to enable running Comet in onheap mode #2554 (andygrove)
- feat:support ansi mode rounding function #2542 (coderfender)
- feat:support ansi mode remainder function #2556 (coderfender)
- feat: Implement array-to-string cast support #2425 (cfmcgrady)
- feat: Various improvements to memory pool configuration, logging, and documentation #2538 (andygrove)
- feat: Enable complex types for columnar shuffle #2573 (mbutrovich)
- feat: support_decimal_types_bool_cast_native_impl #2490 (coderfender)
- feat: Use buf write to reduce system call on index write #2579 (zuston)
Documentation updates:
- doc: Document usage IcebergCometBatchReader.java #2347 (comphead)
- docs: Add changelog for 0.10.0 release #2361 (andygrove)
- docs: Fix error in docs #2373 (andygrove)
- docs: Fix more comet versions in docs #2374 (andygrove)
- docs: Publish 0.10.0 user guide #2394 (andygrove)
- doc: macos benches doc clarifications #2418 (comphead)
- docs: update configs.md after #2422 #2428 (mbutrovich)
- docs: update docs and tuning guide related to native shuffle #2487 (mbutrovich)
- docs: Improve EC2 benchmarking guide #2474 (andygrove)
- docs: docs_update_ansi_support #2496 (coderfender)
- docs:support lpad expression documentation update #2517 (coderfender)
- docs: doc changes to support ANSI mode integral divide #2570 (coderfender)
- docs: Split configuration guide into different sections (scan, exec, shuffle, etc) #2568 (andygrove)
- docs: doc update to support ANSI mode remainder function #2576 (coderfender)
- docs: Documentation updates #2581 (andygrove)
Other:
- chore(deps): bump uuid from 1.18.0 to 1.18.1 in /native #2336 (dependabot[bot])
- build: Check that all Scala test suites run in PR builds #2304 (andygrove)
- chore: Start 0.11.0 development #2365 (andygrove)
- chore: Split expression serde hash map into separate categories #2322 (andygrove)
- chore: exclude Iceberg diffs from rat checks #2376 (hsiang-c)
- chore: Refactor UnaryMinus serde #2378 (andygrove)
- chore: Revert "chore: [1941-Part1]: Introduce
map_sortscalar function (#2… #2381 (comphead) - chore: Refactor Literal serde #2377 (andygrove)
- chore: Output
BaseAggregateExecaccurate unsupported names #2383 (comphead) - chore: Improve Initcap test and docs #2387 (andygrove)
- build: fix build of 'hdfs-opendal' feature for MacOS #2392 (parthchandra)
- chore(deps): bump cc from 1.2.36 to 1.2.37 in /native #2399 (dependabot[bot])
- chore: [iceberg] support Iceberg 1.9.1 #2386 (hsiang-c)
- minor: Add deprecation notice to
datafusion-comet-spark-exprcrate #2405 (andygrove) - minor: Update benchmarking scripts to specify scan implementation #2403 (andygrove)
- refactor: Scala hygiene - remove
scala.collection.JavaConverters#2393 (hsiang-c) - chore: Improve test coverage for
countaggregates #2406 (andygrove) - chore: upgrade to DataFusion 50.0.0, Arrow 56.1.0, Parquet 56.0.0 among others #2286 (mbutrovich)
- chore: Support Spark 4.0.1 instead of 4.0.0 #2414 (andygrove)
- chore: Respect native features env for cargo commands #2296 (wForget)
- minor: Update TPC-DS microbenchmarks to remove "scan only" and "exec only" runs #2396 (andygrove)
- minor: Add RDDScan to default value of sparkToColumnar.supportedOperatorList #2422 (wForget)
- chore: new TPC-DS golden plans #2426 (mbutrovich)
- chore: fix
pr_build*.yml#2434 (comphead) - chore: Remove unused class #2437 (wForget)
- chore(deps): bump cc from 1.2.37 to 1.2.38 in /native #2439 (dependabot[bot])
- chore: add validate_workflows.yml #2441 (comphead)
- test: potential native broadcast failure in scenarios with ReusedExhange #2167 (akupchinskiy)
- chore: Improvements of fallback info #2450 (wForget)
- chore: Upgrade Apache Release Audit Tool (RAT) to 0.16.1 #2451 (andygrove)
- minor: Remove reference to SortExec deadlock issue that is now resolved #2464 (andygrove)
- chore: Use checked operations when growing or shrinking unified memory pool #2455 (andygrove)
- minor: Improve the log message of
CometTestBase#checkCometOperators#2458 (cfmcgrady) - minor: Skip calculating per-task memory limit when in off-heap mode #2462 (andygrove)
- Chore: Used DataFusion impl of bit_get function #2466 (kazantsev-maksim)
- chore(deps): bump regex from 1.11.2 to 1.11.3 in /native #2483 (dependabot[bot])
- chore: update TPS-DS plans after #2429 #2486 (mbutrovich)
- chore(deps): bump thiserror from 2.0.16 to 2.0.17 in /native #2485 (dependabot[bot])
- chore(deps): bump cc from 1.2.38 to 1.2.39 in /native #2484 (dependabot[bot])
- chore: Support running specific benchmark query #2491 (comphead)
- chore: Make CometColumnarToRowExec extends CometPlan #2460 (wForget)
- chore: Update artifacts to 0.10.0 #2500 (comphead)
- build: Stop caching libcomet in CI #2498 (andygrove)
- chore: Upgrade Maven plugins #2494 (andygrove)
- Chore: Used DataFusion impl of date_add and date_sub functions #2473 (kazantsev-maksim)
- minor: include taskAttemptId in log messages #2467 (andygrove)
- chore: Improve test assertions in plan stability suite #2505 (andygrove)
- build: Add Spark 4.0 to release build script #2514 (parthchandra)
- chore: Enable plan stability tests for
native_iceberg_compat#2519 (andygrove) - chore(deps): bump parking_lot from 0.12.4 to 0.12.5 in /native #2530 (dependabot[bot])
- chore(deps): bump cc from 1.2.39 to 1.2.40 in /native #2529 (dependabot[bot])
- chore: Refactor serde for
ArrayCompactandArrayFilter#2536 (andygrove) - Chore: Fix Scala code warnings - common module #2527 (andy-hf-kwok)
- chore: Refactor serde for
CheckOverflow#2537 (andygrove) - build: Run scala tests against release build of native code #2541 (andygrove)
- chore: Pass Comet configs to native
createPlan#2543 (andygrove) - chore: Refactor serde for Length #2547 (andygrove)
- chore: Include spark shim sources for spotless plugin and reformat #2557 (wForget)
- chore(deps): bump opendal from 0.54.0 to 0.54.1 in /native #2559 (dependabot[bot])
- chore: Finish moving Cast serde out of QueryPlanSerde #2550 (andygrove)
- chore: Use cargo-nextest in CI #2546 (andygrove)
- chore: Delete unused code #2565 (zuston)
- chore: Improve plan comet transformation log #2564 (wForget)
- chore(deps): bump cc from 1.2.40 to 1.2.41 in /native #2560 (dependabot[bot])
- chore(deps): bump aws-credential-types from 1.2.6 to 1.2.7 in /native #2563 (dependabot[bot])
- chore: Refactor serde for RegExpReplace #2548 (andygrove)
- chore: use polymorphic map builders in shuffle. #2571 (ashdnazg)
- chore: Move ToPrettyString serde into shim layer #2549 (andygrove)
- chore(deps): bump DataFusion dependencies to 50.2.0, refresh Cargo.lock #2575 (mbutrovich)
Credits
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
47 Andy Grove
15 Zhen Wang
14 B Vadlamani
12 Oleks V
11 dependabot[bot]
10 Matt Butrovich
5 Parth Chandra
5 hsiang-c
3 Fu Chen
3 Junfan Zhang
2 Kazantsev Maksim
1 Artem Kupchinskiy
1 Eshed Schacham
1 Manu Zhang
1 andy-hf-kwok
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.