Skip to content

v6.0.0-incubating

Latest

Choose a tag to compare

@richox richox released this 17 Oct 13:45
· 251 commits to master since this release

New Features

  • New Configurations: Introduced settings for decimal operations, JSON parsing fallback, Parquet reader, native logging, and compression.
  • Memory Management: Improve memory management using Linux RSS (resident set size).
  • Operators: Supports operator fusion in Sort -> SortMergeJoin execution, reducing costs of join key serialization.
  • Enhanced Compatibility: Added support for JDK 17 and Scala 2.13.
  • New Functions: Added support for trim in casts and extended hashing function coverage.

Improvements

  • Stability: Improved handling of stage retry on shuffle failures and memory spilling.
  • Modularity: Restructured codebase by extracting Celeborn, Uniffle, and Paimon into separate 3rdparty modules.
  • Observability: Improved logging with Thread IDs and enhanced Spark UI metrics for skew detection.
  • Uniffle Integration: Improved support and documentation for Uniffle shuffle manager.
  • Minor Performance Improvement: Optimized batch serde, array interleavig and coalescing.
  • Build & CI: Enhanced build scripts, added ARM support, and streamlined the CI process.

Bug Fixes

  • Data Correctness: Fixed critical issues in join logic, value comparisons, and hash calculations.
  • Memory Leaks & Crashes: Resolved memory management issues and NPEs.
  • Execution Engine: Fixed errors in outer generate, UDTF execution, and Parquet sink tasks.
  • Integration: Corrected issues with 3rdparty systems like Celeborn and Uniffle.

NOTE: This release includes a significant number of performance optimizations, memory management improvements, bug fixes, and new features, with notable enhancements in shuffle management, execution engine optimization, and third-party integration. Some minor changes are not included in the above list, please see the commit list for more details.

What's Changed

New Contributors

Full Changelog: v5.0.0...v6.0.0