Skip to content

0.11.0

Pre-release
Pre-release

Choose a tag to compare

@andygrove andygrove released this 19 Oct 18:00
· 206 commits to main since this release

DataFusion Comet 0.11.0 Changelog

This release consists of 131 commits from 15 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: temporarily ignore test for hdfs file systems #2359 (parthchandra)
  • fix: Check reused broadcast plan in non-AQE and make setNumPartitions thread safe #2398 (wForget)
  • fix: correct missingInput for CometHashAggregateExec #2409 (comphead)
  • fix:clippy errros rust 1.9.0 update #2419 (coderfender)
  • fix: Avoid spark plan execution cache preventing CometBatchRDD numPartitions change #2420 (wForget)
  • fix: regressions in CometToPrettyStringSuite #2384 (hsiang-c)
  • fix: Byte array Literals failed on cast #2432 (comphead)
  • fix: Do not push down subquery filters on native_datafusion scan #2438 (wForget)
  • fix: Improve error handling when resolving S3 bucket region #2440 (andygrove)
  • fix: [iceberg] additional parquet independent api for iceberg integration #2442 (parthchandra)
  • fix: Specify reqwest crate features #2446 (andygrove)
  • fix: distributed RangePartitioning bounds calculation with native shuffle #2258 (mbutrovich)
  • fix: fix regression in tpcbench.py #2512 (andygrove)
  • fix: [iceberg] Close reader instance in ReadConf #2510 (hsiang-c)
  • fix: Enable plan stability tests for auto scan #2516 (andygrove)
  • fix: Capture unexpected output when retrieving JVM 17 args in Makefile #2566 (zuston)

Performance related:

  • perf: New Configuration from shared conf to avoid high costs #2402 (wForget)
  • perf: Use DataFusion's count_udaf instead of SUM(IF(expr IS NOT NULL, 1, 0)) #2407 (andygrove)
  • perf: Improve BroadcastExchangeExec conversion #2417 (wForget)

Implemented enhancements:

  • feat: Add dynamic enabled and allowIncompat configs for all supported expressions #2329 (andygrove)
  • feat: feature specific tests #2372 (parthchandra)
  • feat: Support more date part expressions #2316 (wForget)
  • feat: rpad support column for second arg instead of just literal #2099 (coderfender)
  • feat: Support comet native log level conf #2379 (wForget)
  • feat: Enable WeekDay function #2411 (wForget)
  • feat: Add nested Array literal support #2181 (comphead)
  • feat:add_additional_char_support_rpad #2436 (coderfender)
  • feat: do not fallback to Spark for COUNT(distinct) #2429 (comphead)
  • feat: implement_ansi_eval_mode_arithmetic #2136 (coderfender)
  • feat: Add plan conversion statistics to extended explain info #2412 (andygrove)
  • feat: implement_comet_native_lpad_expr #2102 (coderfender)
  • feat: Add backtrace feature to simplify enabling native backtraces in CometNativeException #2515 (andygrove)
  • feat: Support reverse function with ArrayType input #2481 (cfmcgrady)
  • feat: Change default off-heap memory pool from greedy_unified to fair_unified #2526 (andygrove)
  • feat: Make DiskManager max_temp_directory_size configurable #2479 (manuzhang)
  • feat: Parquet Modular Encryption with Spark KMS for native readers #2447 (mbutrovich)
  • feat: Add support for Spark-compatible cast from integral to decimal #2472 (coderfender)
  • feat:Support ANSI mode integral divide #2421 (coderfender)
  • feat: Add config to enable running Comet in onheap mode #2554 (andygrove)
  • feat:support ansi mode rounding function #2542 (coderfender)
  • feat:support ansi mode remainder function #2556 (coderfender)
  • feat: Implement array-to-string cast support #2425 (cfmcgrady)
  • feat: Various improvements to memory pool configuration, logging, and documentation #2538 (andygrove)
  • feat: Enable complex types for columnar shuffle #2573 (mbutrovich)
  • feat: support_decimal_types_bool_cast_native_impl #2490 (coderfender)
  • feat: Use buf write to reduce system call on index write #2579 (zuston)

Documentation updates:

  • doc: Document usage IcebergCometBatchReader.java #2347 (comphead)
  • docs: Add changelog for 0.10.0 release #2361 (andygrove)
  • docs: Fix error in docs #2373 (andygrove)
  • docs: Fix more comet versions in docs #2374 (andygrove)
  • docs: Publish 0.10.0 user guide #2394 (andygrove)
  • doc: macos benches doc clarifications #2418 (comphead)
  • docs: update configs.md after #2422 #2428 (mbutrovich)
  • docs: update docs and tuning guide related to native shuffle #2487 (mbutrovich)
  • docs: Improve EC2 benchmarking guide #2474 (andygrove)
  • docs: docs_update_ansi_support #2496 (coderfender)
  • docs:support lpad expression documentation update #2517 (coderfender)
  • docs: doc changes to support ANSI mode integral divide #2570 (coderfender)
  • docs: Split configuration guide into different sections (scan, exec, shuffle, etc) #2568 (andygrove)
  • docs: doc update to support ANSI mode remainder function #2576 (coderfender)
  • docs: Documentation updates #2581 (andygrove)

Other:

  • chore(deps): bump uuid from 1.18.0 to 1.18.1 in /native #2336 (dependabot[bot])
  • build: Check that all Scala test suites run in PR builds #2304 (andygrove)
  • chore: Start 0.11.0 development #2365 (andygrove)
  • chore: Split expression serde hash map into separate categories #2322 (andygrove)
  • chore: exclude Iceberg diffs from rat checks #2376 (hsiang-c)
  • chore: Refactor UnaryMinus serde #2378 (andygrove)
  • chore: Revert "chore: [1941-Part1]: Introduce map_sort scalar function (#2#2381 (comphead)
  • chore: Refactor Literal serde #2377 (andygrove)
  • chore: Output BaseAggregateExec accurate unsupported names #2383 (comphead)
  • chore: Improve Initcap test and docs #2387 (andygrove)
  • build: fix build of 'hdfs-opendal' feature for MacOS #2392 (parthchandra)
  • chore(deps): bump cc from 1.2.36 to 1.2.37 in /native #2399 (dependabot[bot])
  • chore: [iceberg] support Iceberg 1.9.1 #2386 (hsiang-c)
  • minor: Add deprecation notice to datafusion-comet-spark-expr crate #2405 (andygrove)
  • minor: Update benchmarking scripts to specify scan implementation #2403 (andygrove)
  • refactor: Scala hygiene - remove scala.collection.JavaConverters #2393 (hsiang-c)
  • chore: Improve test coverage for count aggregates #2406 (andygrove)
  • chore: upgrade to DataFusion 50.0.0, Arrow 56.1.0, Parquet 56.0.0 among others #2286 (mbutrovich)
  • chore: Support Spark 4.0.1 instead of 4.0.0 #2414 (andygrove)
  • chore: Respect native features env for cargo commands #2296 (wForget)
  • minor: Update TPC-DS microbenchmarks to remove "scan only" and "exec only" runs #2396 (andygrove)
  • minor: Add RDDScan to default value of sparkToColumnar.supportedOperatorList #2422 (wForget)
  • chore: new TPC-DS golden plans #2426 (mbutrovich)
  • chore: fix pr_build*.yml #2434 (comphead)
  • chore: Remove unused class #2437 (wForget)
  • chore(deps): bump cc from 1.2.37 to 1.2.38 in /native #2439 (dependabot[bot])
  • chore: add validate_workflows.yml #2441 (comphead)
  • test: potential native broadcast failure in scenarios with ReusedExhange #2167 (akupchinskiy)
  • chore: Improvements of fallback info #2450 (wForget)
  • chore: Upgrade Apache Release Audit Tool (RAT) to 0.16.1 #2451 (andygrove)
  • minor: Remove reference to SortExec deadlock issue that is now resolved #2464 (andygrove)
  • chore: Use checked operations when growing or shrinking unified memory pool #2455 (andygrove)
  • minor: Improve the log message of CometTestBase#checkCometOperators #2458 (cfmcgrady)
  • minor: Skip calculating per-task memory limit when in off-heap mode #2462 (andygrove)
  • Chore: Used DataFusion impl of bit_get function #2466 (kazantsev-maksim)
  • chore(deps): bump regex from 1.11.2 to 1.11.3 in /native #2483 (dependabot[bot])
  • chore: update TPS-DS plans after #2429 #2486 (mbutrovich)
  • chore(deps): bump thiserror from 2.0.16 to 2.0.17 in /native #2485 (dependabot[bot])
  • chore(deps): bump cc from 1.2.38 to 1.2.39 in /native #2484 (dependabot[bot])
  • chore: Support running specific benchmark query #2491 (comphead)
  • chore: Make CometColumnarToRowExec extends CometPlan #2460 (wForget)
  • chore: Update artifacts to 0.10.0 #2500 (comphead)
  • build: Stop caching libcomet in CI #2498 (andygrove)
  • chore: Upgrade Maven plugins #2494 (andygrove)
  • Chore: Used DataFusion impl of date_add and date_sub functions #2473 (kazantsev-maksim)
  • minor: include taskAttemptId in log messages #2467 (andygrove)
  • chore: Improve test assertions in plan stability suite #2505 (andygrove)
  • build: Add Spark 4.0 to release build script #2514 (parthchandra)
  • chore: Enable plan stability tests for native_iceberg_compat #2519 (andygrove)
  • chore(deps): bump parking_lot from 0.12.4 to 0.12.5 in /native #2530 (dependabot[bot])
  • chore(deps): bump cc from 1.2.39 to 1.2.40 in /native #2529 (dependabot[bot])
  • chore: Refactor serde for ArrayCompact and ArrayFilter #2536 (andygrove)
  • Chore: Fix Scala code warnings - common module #2527 (andy-hf-kwok)
  • chore: Refactor serde for CheckOverflow #2537 (andygrove)
  • build: Run scala tests against release build of native code #2541 (andygrove)
  • chore: Pass Comet configs to native createPlan #2543 (andygrove)
  • chore: Refactor serde for Length #2547 (andygrove)
  • chore: Include spark shim sources for spotless plugin and reformat #2557 (wForget)
  • chore(deps): bump opendal from 0.54.0 to 0.54.1 in /native #2559 (dependabot[bot])
  • chore: Finish moving Cast serde out of QueryPlanSerde #2550 (andygrove)
  • chore: Use cargo-nextest in CI #2546 (andygrove)
  • chore: Delete unused code #2565 (zuston)
  • chore: Improve plan comet transformation log #2564 (wForget)
  • chore(deps): bump cc from 1.2.40 to 1.2.41 in /native #2560 (dependabot[bot])
  • chore(deps): bump aws-credential-types from 1.2.6 to 1.2.7 in /native #2563 (dependabot[bot])
  • chore: Refactor serde for RegExpReplace #2548 (andygrove)
  • chore: use polymorphic map builders in shuffle. #2571 (ashdnazg)
  • chore: Move ToPrettyString serde into shim layer #2549 (andygrove)
  • chore(deps): bump DataFusion dependencies to 50.2.0, refresh Cargo.lock #2575 (mbutrovich)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    47	Andy Grove
    15	Zhen Wang
    14	B Vadlamani
    12	Oleks V
    11	dependabot[bot]
    10	Matt Butrovich
     5	Parth Chandra
     5	hsiang-c
     3	Fu Chen
     3	Junfan Zhang
     2	Kazantsev Maksim
     1	Artem Kupchinskiy
     1	Eshed Schacham
     1	Manu Zhang
     1	andy-hf-kwok

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.