New Features
- New Configurations: Introduced settings for decimal operations, JSON parsing fallback, Parquet reader, native logging, and compression.
- Memory Management: Improve memory management using Linux RSS (resident set size).
- Operators: Supports operator fusion in Sort -> SortMergeJoin execution, reducing costs of join key serialization.
- Enhanced Compatibility: Added support for JDK 17 and Scala 2.13.
- New Functions: Added support for trim in casts and extended hashing function coverage.
Improvements
- Stability: Improved handling of stage retry on shuffle failures and memory spilling.
- Modularity: Restructured codebase by extracting Celeborn, Uniffle, and Paimon into separate 3rdparty modules.
- Observability: Improved logging with Thread IDs and enhanced Spark UI metrics for skew detection.
- Uniffle Integration: Improved support and documentation for Uniffle shuffle manager.
- Minor Performance Improvement: Optimized batch serde, array interleavig and coalescing.
- Build & CI: Enhanced build scripts, added ARM support, and streamlined the CI process.
Bug Fixes
- Data Correctness: Fixed critical issues in join logic, value comparisons, and hash calculations.
- Memory Leaks & Crashes: Resolved memory management issues and NPEs.
- Execution Engine: Fixed errors in outer generate, UDTF execution, and Parquet sink tasks.
- Integration: Corrected issues with 3rdparty systems like Celeborn and Uniffle.
NOTE: This release includes a significant number of performance optimizations, memory management improvements, bug fixes, and new features, with notable enhancements in shuffle management, execution engine optimization, and third-party integration. Some minor changes are not included in the above list, please see the commit list for more details.
What's Changed
- [BLAZE-975] Fix duplicated shuffle data fetch under AQE Rebalance when using Uniffle in Blaze by @merrily01 in #976
- [DOCS] Add Uniffle integration guide to README.md by @merrily01 in #987
- improve OnHeapSpillManager concurrency by @richox in #990
- refactor UDTF: fix sliced binary array + ffi error by @richox in #989
- fix outer generate by @richox in #991
- Fix ScalarValue::List comparison by @richox in #994
- refactor AccGenericColumn for better data type inference and memory usage statistics. by @richox in #997
- fix incorrect join: rewrite join keys to fit spark's LongHashedRelation by @richox in #1002
- default SUGGEST_BATCH_MEM_SIZE to 8MB by @richox in #1005
- refactor SortExec: improve performance and memory statistics by @richox in #1006
- batch_serde improvements by @richox in #1007
- OnHeapSpillManager improvement by @richox in #1008
- disable netty off-heap memory usage in BlazeShuffleManager by @richox in #1009
- update datafusion dep: cherry pick apache/arrow-rs#7422: improve take_bytes performance and reduce oom by @richox in #1000
- [BLAZE-1010] Bump Paimon from 1.0.1 to 1.1.1 by @SteNicholas in #1011
- add TID to logging by @richox in #1014
- optimize array interleaving by @richox in #1019
- add memory limit to process resident usage by @richox in #1018
- fix SpillBuf bug when getting disk size of a closed file by @richox in #1020
- support normalize nan and zero by @Flyangz in #1016
- fix GetIndexedField nullable logic by @richox in #1022
- Script Mode Modification to Reduce CI Time by @lihao712 in #1025
- fix AccScalarValueColumn with null values by @diliulou in #1023
- Disable native decimal binary operation by @harveyyue in #1030
- diable parquet predicate pruning for decimal types by @richox in #1033
- fix incorrect FirstIgnoresNull logic by @richox in #1034
- supports stage retry when shuffle read failed by @richox in #1035
- fix incorrect First & FirstIgnoresNull logic by @Flyangz in #1037
- fix NPE when initializing non-deterministic UDF wrapper in driver side by @richox in #1039
- do not convert unsupported aggregate functions to UDAF wrapper by @richox in #1040
- add spark.blaze.decimal.arithOp.enabled, defaults to false by @richox in #1042
- fix blaze decimal opt not fallback when arithOp set false as default by @xm0830 in #1045
- fix NPE in UDAFWrapper.resize by @richox in #1043
- fix HashJoin rewriteKeyExpr by @richox in #1044
- update rust edition to 2024 by @diliulou in #1051
- fix all warnings by @diliulou in #1052
- format Arc to PhysicalExprRef by @diliulou in #1053
- move all dependency versions to root cargo.toml by @diliulou in #1055
- supports spark.blaze.parseJsonError.fallback by @diliulou in #1050
- [BLAZE-1056] Bump Celeborn version from 0.5.4 to 0.6.0 by @SteNicholas in #1057
- remove unused cargo deps by @diliulou in #1058
- java.lang.NoClassDefFoundError on Spark local mode using --conf spark.jars by @XorSum in #1047
- Exclude log4j-slf4j-impl introduced from rss-client-spark3-shaded by @wForget in #1064
- Use
foldLeftinstead ofmap+suminSparkUDAFWrapperContextby @cxzl25 in #1074 - [BLAZE-1071][FOLLOWUP] Fix incorrect mapStatus to prevent Uniffle failures in Blaze by @merrily01 in #1072
- [MINOR] Fix typo in
bloom_filter_might_contain.rsby @merrily01 in #1076 - Blaze Sort->MergeJoin reduces key row and column conversion by @eden123456789 in #1078
- Spark UI SMJ skew by @cxzl25 in #1079
- improve performance of CoalesceStream by @richox in #1081
- scala-compiler/scala-reflect scope provided by @cxzl25 in #1082
- [BLAZE-1085] Make native log level configurable by @wForget in #1086
- Add spotless check and apply in reformat by @cxzl25 in #1083
- Improve rust log format by @cxzl25 in #1084
- Spark UI SHJ skew by @cxzl25 in #1088
- ByteBuddy use contextClassLoader by @XorSum in #1087
- Use
foldLeftinstead ofmap+sumby @XorSum in #1091 - Fix modules relative path by @turboFei in #1090
- Avoid traversing Project list by @cxzl25 in #1094
- SortExec: use merge sort for long keys to reduce number of comparison by @richox in #1096
- improve DS scan unsupported message by @cxzl25 in #1093
- CI batch by @cxzl25 in #1102
- [NIT] Fix some typos by @turboFei in #1099
- [BLAZE-1100] Fallback shuffle exchange when RoundRobinPartitioning with unsupported MapType by @merrily01 in #1101
- tokio thread name with tid by @cxzl25 in #1095
- Support to build project with fixed maven version by @turboFei in #1089
- Fix and refine build-native.sh by @turboFei in #1097
- CI protoc token by @cxzl25 in #1103
- Fix NativeShuffledHash not implemented error by @turboFei in #1098
- Skip build native for dev/reformat by @turboFei in #1107
- [BLAZE-1104] Make Parquet maxOverReadSize and metadataCacheSize configurable by @merrily01 in #1105
- Remove unused dev/.scalafmt.conf by @turboFei in #1110
- fix execution error in non-native parquet sink tasks by @richox in #1123
- Move build-native.sh into mvn-build-helper folder by @turboFei in #1106
- Support to enable/disable blaze for different SparkPlan types during runtime by @turboFei in #1109
- Make io compression zstd level configurable by @turboFei in #1111
- Remove log cast key value by @cxzl25 in #1118
- Shade more packages by @turboFei in #1121
- Reuse code for ensure jni bridge inited by @turboFei in #1122
- [BLAZE-1114] Support to build on JDK17 by @turboFei in #1115
- Make shuffle compression target buf size configurable by @turboFei in #1113
- Redirect build-native.sh output to stdout to fix mvn log level by @turboFei in #1116
- use rust-toolchain
nightly-2025-05-09by @cxzl25 in #1126 - [BLAZE-1107][FOLLOWUP] Reformat rust code by @turboFei in #1112
- [BLAZE-1109][FOLLOWUP] Support to dynamic adjust config for native converter by @turboFei in #1130
- Remove unused code and skip scalafix with
// scalafix:offby @turboFei in #1125 - [BLAZE-1089][FOLLOWUP] Using project maven version for reformat and release docker by @turboFei in #1134
- [MINOR] Extract and unify plan conversion debug logs via BlazeLogUtils.scala by @merrily01 in #1120
- Fix incorrect Celeborn mapStatus by @cxzl25 in #1133
- [MINOR] Change executing native plan log level by @wForget in #1155
- fix execute_projected_with_key_rows_output() by @richox in #1156
- Support to show
os.detected.classfierfor blaze jar by @turboFei in #1129 - Run cargo fix when formating rust code by @turboFei in #1160
- update arrow/datafusion dependencies to v55.2.0/v49.0.0 by @richox in #1154
- reduce NotImplementedError log length by @cxzl25 in #1152
- [BLAZE-1137] Extracting celeborn/uniffle/paimon code to seperated modules by @turboFei in #1136
- [BLAZE-1127] Support to build blaze on scala-2.13 by @turboFei in #1128
- [BLAZE-1162] Improve error handling by propagating detailed error messages by @merrily01 in #1163
- Update pull request template for patch testing by @turboFei in #1171
- [BLAZE-1114][FOLLOWUP] Reduce GA number for cross JDK versions testing by @turboFei in #1146
- [BLAZE-1169] Add GA to build blaze on ubuntu arm runners by @turboFei in #1170
- [BLAZE-1165] Enable scalatest to run the blaze UT by @turboFei in #1161
- WIP: rename Blaze to Auron by @richox in #1174
- update tpc-ds benchmark with auron-6.0.0-preview by @richox in #1177
- close stale PRs by @cxzl25 in #1182
- Update auron banner by @richox in #1181
- Update apache license header by @turboFei in #1180
- remove TruncDate function convension for inconsistent behavior by @richox in #1183
- [FOLLOWUP] Fix workflow errors caused by incomplete project renaming from Blaze to Auron by @merrily01 in #1184
- Remove duplicate celeborn module code by @turboFei in #1187
- [MINOR] Fix typo in
auron-build.shby @merrily01 in #1186 - fix build script bug by @richox in #1185
- [MINOR] Minor improvements for
auron-build.shby @merrily01 in #1190 - Auron project parent to ASF by @turboFei in #1188
- [BLAZE-1149] Fix shuffle file permission issue when using BlazeShuffleManager by @turboFei in #1148
- fix incorrect hash for array types by @richox in #1192
- fix incorrect pushdown filtering configuration by @richox in #1194
- Enable scalafmt rewrite imports with groups order by @turboFei in #1193
- Remove unused profile and activations by @turboFei in #1197
- Reformat the thirdparty code by @turboFei in #1196
- Remove testenv files by @turboFei in #1200
- Introduce apache-rat-plugin to check license by @turboFei in #1198
- Support to build auron with extra maven options by @turboFei in #1209
- Using maven.multiModuleProjectDirectory to fix IDEA build issue by @turboFei in #1205
- bump hadoop-client-api 3.4.1 by @cxzl25 in #1210
- Introduce datafusion-spark crate to support some math functions by @wForget in #1215
- Add license for remaining project files by @turboFei in #1204
- Fix ARM CI by @cxzl25 in #1195
- cache positionedReadable by @cxzl25 in #1211
- Remove tpcds benchmark tool kit by @turboFei in #1199
- Refine the auron build for IDEA developer friendly by @turboFei in #1208
- update READEME by @diliulou in #1217
- Separate --skiptests option to streamline builds by @merrily01 in #1207
- move auxiliary repos from github.com/blaze-init to github.com/auron-project by @richox in #1218
- Fix typos in README by @turboFei in #1220
- Setup celeborn integration testing GA and fix celeborn-0.5 integration issue by @turboFei in #1221
- Reformat all code including third-party code first to save time by @turboFei in #1223
- Add branches condition for GA by @turboFei in #1224
- [AURON-1226] Use git submodules for setup-rust-toolchain by @turboFei in #1228
- remove issue template header by @cxzl25 in #1230
- Clean up
.gitignoreby removing project-specific shim module by @merrily01 in #1240 - remove unsupported action by @cxzl25 in #1232
- bump protobuf-java to 3.25.5 by @XorSum in #1234
- typo: change blaze to auron by @XorSum in #1233
- fix ORC timestamp timezone by @cxzl25 in #1229
- Update benchmark document link to
auron.apache.orgby @merrily01 in #1237 - add .asf.yaml by @richox in #1239
- [MINOR][DOCS] Update links from
kwai/aurontoapache/auronby @merrily01 in #1236 auron-build.shadd usage example of-DskipBuildNativeby @XorSum in #1235- change spark archive url by @cxzl25 in #1249
- Add missing frontmatter back to
feature_request.mdissue template by @merrily01 in #1250 - Fix missed reference: update
kwai/aurontoapache/auronby @merrily01 in #1251 - [AURON-1139] Setup integration testing with Uniffle by @turboFei in #1222
- remove PR template header by @cxzl25 in #1248
- Apache Auron (incubating) by @turboFei in #1254
- [AURON-1245][FOLLOWUP] Restore commit hash in artifact name using git rev-parse by @merrily01 in #1246
- chore: using dependencyManagement for dependencies by @turboFei in #1244
- Deprecate RuntimeConfig, update code to use new builder style by @XorSum in #1255
- fix some rust style by @cxzl25 in #1257
- build release when change by @cxzl25 in #1247
- Replace hardcode and unify build artifact names by @merrily01 in #1256
- [AURON-1212] Consolidate build entry: integrate release-docker.sh into auron-build.sh (with Docker support) by @merrily01 in #1213
- [AURON-1261][INFRA] Add GitHub PR auto-labeler with module-based rules by @merrily01 in #1262
- [AURON-1258] Bump Paimon from 1.1.1 to 1.2.0 by @merrily01 in #1259
- [AURON-1265] Improve PR template with title/description guidelines by @merrily01 in #1266
- remove maven module duplicate group id by @cxzl25 in #1270
- [typo] fix typo in
cast.rsby @XorSum in #1273 - [AURON-1277] Improvement for deprecated API thread_rng/gen_range to rng/random_range by @xuzifu666 in #1278
- remove duplicate isDebugEnabled by @cxzl25 in #1280
- fix paimon package name by @cxzl25 in #1279
- [AURON-1283] Make issue template cleaner by commenting out placeholder text by @merrily01 in #1284
- [AURON-1272] Support HDFS CallerContext by @cxzl25 in #1260
- [AURON-1289] fix ORC delta overflow by @cxzl25 in #1291
- run CI after push by @cxzl25 in #1293
- [AURON-1285] Clean up outdated copy-source comments in the codebase by @merrily01 in #1286
- [AURON-1299] Optimize the comments for the get_json_object UDF by @Tartarus0zm in #1300
- [AURON-1297] Add license and notice for source package release by @FMX in #1298
- [AURON-1281][INFRA] Make workflow flexible with dynamic Spark version by @merrily01 in #1282
- [AURON-1307] Add docs comments to the method in filter_exec.rs and ff… by @Tartarus0zm in #1308
- [AURON-1305] Refine and strengthen Maven bootstrap script by @merrily01 in #1306
- [AURON-1312] Add scalar coverage for spark_sha2 hashing by @hhhizzz in #1313
- Add merge_auron_pr.py by @turboFei in #1276
- [AURON-1302] Add asf release scripts by @FMX in #1311
- [AURON #985] Expect to convert DataWritingCommandExec to NativeParquetSinkExec by @turboFei in #1274
- [AURON-1316] Support trim in cast expression by @hhhizzz in #1317
- fix incorrect common_prefix_len() by @richox in #1320
- Update PR title guideline to use [AURON #XXXX] format by @merrily01 in #1321
- Build: Bump hadoop from 3.4.1 to 3.4.2. by @slfan1989 in #1326
- [AURON #1334] Fix hardcoded bash shebang by @merrily01 in #1335
- Doc: Fix typo in release file by @slfan1989 in #1338
- [AURON #1309][FOLLOWUP] Make workflow flexible with dynamic dependency versions by @merrily01 in #1310
- [RELEASE] Bump version 6.0.0-incubating by @richox in #1325
- Bump sonic-rs from 0.5.0 to 0.5.1 by @dependabot[bot] in #974
- Bump poem from 3.1.9 to 3.1.10 by @dependabot[bot] in #980
- Bump tonic-build from 0.13.0 to 0.13.1 by @dependabot[bot] in #981
- Bump tempfile from 3.19.1 to 3.20.0 by @dependabot[bot] in #984
- Bump tokio from 1.44.2 to 1.45.0 by @dependabot[bot] in #982
- Bump tokio from 1.45.0 to 1.45.1 by @dependabot[bot] in #998
- Bump parking_lot from 0.12.3 to 0.12.4 by @dependabot[bot] in #1004
- Bump poem from 3.1.10 to 3.1.11 by @dependabot[bot] in #1017
- Bump uuid from 1.16.0 to 1.17.0 by @dependabot[bot] in #1012
- Bump prost from 0.13.5 to 0.14.1 by @dependabot[bot] in #1029
- Bump pprof from 0.14.0 to 0.15.0 by @dependabot[bot] in #1003
- Bump sonic-rs from 0.5.1 to 0.5.2 by @dependabot[bot] in #1041
- Bump lz4_flex from 0.11.3 to 0.11.5 by @dependabot[bot] in #1031
- Bump tokio from 1.45.1 to 1.46.1 by @dependabot[bot] in #1054
- Bump jemalloc_pprof from 0.7.0 to 0.8.0 by @dependabot[bot] in #1059
- Bump tokio from 1.46.1 to 1.47.0 by @dependabot[bot] in #1073
- Bump rand from 0.9.1 to 0.9.2 by @dependabot[bot] in #1060
- Bump poem from 3.1.11 to 3.1.12 by @dependabot[bot] in #1075
- Bump sonic-rs from 0.5.2 to 0.5.3 by @dependabot[bot] in #1061
- Bump tokio from 1.47.0 to 1.47.1 by @dependabot[bot] in #1092
- Bump spark-3.5 version to 3.5.6 by @turboFei in #1131
- Bump async-trait from 0.1.88 to 0.1.89 by @dependabot[bot] in #1172
- Bump tempfile from 3.20.0 to 3.21.0 by @dependabot[bot] in #1191
- Bump sonic-rs from 0.5.3 to 0.5.4 by @dependabot[bot] in #1206
- Bump log from 0.4.27 to 0.4.28 by @dependabot[bot] in #1271
- Bump chrono from 0.4.41 to 0.4.42 by @dependabot[bot] in #1288
- Bump celeborn 0.6.1 by @turboFei in #1294
- Bump tempfile from 3.21.0 to 3.22.0 by @dependabot[bot] in #1290
- Bump serde from 1.0.219 to 1.0.223 by @dependabot[bot] in #1296
- Bump bytesize from 2.0.1 to 2.1.0 by @dependabot[bot] in #1304
- Bump serde from 1.0.223 to 1.0.225 by @dependabot[bot] in #1303
- Bump serde from 1.0.225 to 1.0.226 by @dependabot[bot] in #1322
- Bump object_store from 0.12.3 to 0.12.4 by @dependabot[bot] in #1336
- Bump tempfile from 3.22.0 to 3.23.0 by @dependabot[bot] in #1337
New Contributors
- @diliulou made their first contribution in #1023
- @eden123456789 made their first contribution in #1078
- @turboFei made their first contribution in #1090
- @xuzifu666 made their first contribution in #1278
- @FMX made their first contribution in #1298
- @hhhizzz made their first contribution in #1313
Full Changelog: v5.0.0...v6.0.0