Releases: apache/incubator-gluten
Releases · apache/incubator-gluten
v1.6.0-rc0
What's Changed
- [TEST] Disable a gluten test temporarily: cast string to timestamp by @PHILO-HE in #10518
- [CORE] Bump version to 1.6.0-SNAPSHOT by @PHILO-HE in #10517
- [MINOR] Refactor a string concatenation by following scala style by @beliefer in #10520
- [VL][INFRA] Fix docker build error on Centos-7 by @PHILO-HE in #10522
- [GLUTEN-8953][VL] Support Iceberg overwrite table by @Zouxxyy in #10514
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_08_26) by @GlutenPerfBot in #10528
- [GLUTEN-10521][VL] Fall back
to_jsonfunction for uppercase struct field name by @zml1206 in #10523 - [VL] Gluten-it: Simplify CollectionConverter.scala by @zhztheplayer in #10533
- [VL] Fix missing path
package/**from the Velox backend PR CI path trigger by @zhztheplayer in #10538 - [GLUTEN-10529] Remove unnecessary create for Runtimes by @beliefer in #10530
- [TEST][VL] Reinclude "cast string to timestamp" test by @PHILO-HE in #10532
- [VL] Extend gluten-it to support more data source types by @zhztheplayer in #10554
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_08_27) by @GlutenPerfBot in #10549
- [GLUTEN-10555] Remove unnecessary parameter leafTransformers for WholeStageTransformer by @beliefer in #10556
- [VL] Gluten-it: Clean up Maven dependency relationships by @zhztheplayer in #10563
- [GLUTEN-10552][VL] Fix openEuler compiling issue by @zhouyuan in #10564
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_08_28) by @GlutenPerfBot in #10571
- [FLINK] Add
java-17profile for Flink build and update project version in flink doc by @zjuwangg in #10561 - [GLUTEN-9671][VL] Fix broadcast exchange stackoverflow due to Kryo serialization by @felixloesing in #10541
- [VL] Separate filesystem configuration initialization by @marin-ma in #10540
- [MINOR] Remove unnecessary fields by @beliefer in #10560
- [VL] Support independent Gluten CPP build by @kerwin-zk in #10575
- [GLUTEN-10107][CH] Decouple Celeborn-related code from CH backend module by @zjuwangg in #10537
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_08_29) by @GlutenPerfBot in #10580
- [GLUTEN-10578] Remove unnecessary numaBindingInfo by @beliefer in #10579
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_08_30) by @GlutenPerfBot in #10588
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_08_31) by @GlutenPerfBot in #10589
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_09_01) by @GlutenPerfBot in #10592
- [GLUEN-10107][INFRA]Deprecate isUseUniffleShuffleManager from glutenConfig by @zjuwangg in #10558
- [VL] Gluten-it: Support using Delta tables in TPC-H and TPC-DS benchmarks by @zhztheplayer in #10562
- [GLUTEN-10582][VL] Add Cudf memory resource mode and percent parameters by @jinchengchenghh in #10583
- [GLUTEN-8852][VL] Update package script for spark-400 by @zhouyuan in #10584
- [GLUTEN-8821][VL] Weekly Update Velox function support docs (2025_09_01) by @GlutenPerfBot in #10590
- [GLUTEN-10387][VL] Set ANSI mode for Velox according to Spark's configuration by @PHILO-HE in #10385
- [VL] Gluten-it: Update Delta versions, and other minors by @zhztheplayer in #10594
- [VL] Update Velox branch by @rui-mo in #10597
- [GLUTEN-10524] Remove unnecessary
outputAttributesfromBasicScanExecTransformerby @beliefer in #10525 - [GLUTEN-10595][VL] Separate cpp test utils from the utils directory by @marin-ma in #10596
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_09_02) by @GlutenPerfBot in #10601
- [DOC][FLINK] Update flink build command to skip gpg and spotless check by @zjuwangg in #10604
- [Minor] Refactor test utility to let users compare the query result by @jinchengchenghh in #10565
- [VL][Minor] Remove unused code for shuffle compression mode by @marin-ma in #10609
- [GLUTEN-10599][VL] Fix Centos dev docker image build by @zhouyuan in #10600
- [GLUTEN-10599][VL] Followup to enable git in CI scripts by @zhouyuan in #10610
- [Minor] Add enhanced features runtime config by @jinchengchenghh in #10608
- [VL] Update class duplication list in Maven enforcer by @zhztheplayer in #10536
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_09_03) by @GlutenPerfBot in #10612
- [GLUTEN-9335][VL] Support iceberg partition write by @jinchengchenghh in #10497
- [GLUTEN-10607][MINOR] Fix: Use
setSafein DateWriter to avoid overflow by @jiangjiangtian in #10581 - [GLUTEN-10577][CELEBORN] Refactor
CelebornShuffleManagerto load factory in a better way by @zjuwangg in #10591 - [CORE] Merge SubstraitUtil classes by @kevinwilfong in #10587
- [GLUTEN-10546][FLINK] Support all flink operators for nexmark by @shuai-xu in #10548
- [GLUTEN-10566][VL] Add Spark unix_timestamp support with timestamp and format arguments by @nimesh1601 in #10567
- [Minor] Fix the velox target duplicated include VELOX_BUILD_PATH by @jinchengchenghh in #10615
- [GLUTEN-10570][FLINK] Add
--add-opensoptions to MAVEN_OPTS for Java 17 compatibility by @KevinyhZou in #10572 - [GLUTEN-10605][VL] Rewrite unbounded window to an equivalent aggregate join by @zml1206 in #10606
- [GLUTEN-10013][FLINK] Support function reinterpret by @KevinyhZou in #10022
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_09_04) by @GlutenPerfBot in #10626
- [VL] Refactor gluten-it to pass structured query information to runner by @zhztheplayer in #10623
- [GLUTEN-10210][VL] Enable tpcds tests for Spark-400 in CI by @zhouyuan in #10633
- [VL] Fix arrow url typo by @liujiayi771 in #10641
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_09_06) by @GlutenPerfBot in #10643
- [GLUTEN-10214][VL] Merge inputstream for shuffle reader by @marin-ma in #10499
- [MINOR] Add
.java-versionto.gitignoreby @Zouxxyy in #10642 - [GLUTEN-10618][VL] Update input iterator metrics name to include more details by @marin-ma in #10619
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_09_08) by @GlutenPerfBot in #10653
- [GLUTEN-10635][VL] bugfix: file INSTALL cannot set permissions by @beliefer in #10638
- [GLUTEN-10361][FLINK] Fix UT failure between the conversion of
BinaryRowDataandStatefulRecordby @KevinyhZou in #10362 - [GLUTEN-8889][VL] Fix Spark-355 download in GHA by @zhouyuan in #10655
- [GLUTEN-10450][VL] Reclassify internal/public configs and remove internal configs from doc by @zjuwangg in #10603
- [GLUTEN-9366][VL] Support Iceberg functions by @jinchengchenghh in #10285
- [GLUTEN-10544] Remove unnecessary method separateScanRDD by @beliefer in #10545
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_09_09) by @GlutenPerfBot in #10656
...
v1.5.0
What's Changed
- [GLUTEN-8846][CH] [Part 3] Add benchmark for Icerberg Delete by @baibaichen in #9192
- [GLUTEN-9020][CH] Support delta DV BitmapAggregator by @loneylee in #9138
- [GLUTEN-9197][CH] Simplify sum aggregate expression by @taiyang-li in #9198
- [VL] Enable more ut in VeloxTestSettings by @WangGuangxin in #9080
- [GLUTEN-9199][VL] Fix error when creating shuffle file: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments by @zhztheplayer in #9200
- [CORE] Fix duplicate setting for config LEGACY_TIME_PARSER_POLICY by @jinchengchenghh in #9201
- [GLUTEN-9176][CH] Rewrite aggregate if to aggregate with filter clause by @taiyang-li in #9185
- [GLUTEN-8557][CH] Flatten nested
And/Orfor performance optimization by @KevinyhZou in #8558 - Revert "[GLUTEN-9164][CH]Enable row group level bloom filter push down" by @taiyang-li in #9214
- [GLUTEN-9182][VL] Support new s3 configuration in Gluten by @dcoliversun in #9183
- [VL] Celeborn shuffle reader OOM with many empty input stream by @marin-ma in #9221
- [GLUTEN-8821][VL] Update aggregate/generator/window support doc and script by @marin-ma in #8971
- [VL] Change to use Velox's wget_and_untar in setup-centos7.sh by @yaooqinn in #9207
- [GLUTEN-9196][CH] Use wide-table aggregation to eliminate multi-table joins by @lgbo-ustc in #9155
- [GLUTEN-9149][CORE] Remove Spark-specific code from JniLibLoader & JniWorkspace by @shuai-xu in #9150
- [VL][CI] Change to use JDK-17 for Spark 3.3/3.4/3.5 tests by @PHILO-HE in #9209
- [CORE][VL] Hide child nodes from implementations of
OffloadSingleNodeby @zhztheplayer in #9220 - [GLUTEN-9008][VL] Support json_object_keys function by @dcoliversun in #9009
- [GLUTEN-9239][CH] Support JDK17 for the CH backend by @zzcclp in #9242
- [GLUTEN-9152][CORE] Avoid unnecessary serialization of hadoop conf by @zml1206 in #9153
- [GLUTEN-9240][VL] Write NULL value into relation in gluten unit tests by @dcoliversun in #9241
- [VL][CI] bump to use ubuntu-22.04 runner by @zhouyuan in #9262
- [GLUTEN-9177][CH]Fix diff on parse host of url and refactor
SparkParseURLby @KevinyhZou in #9179 - [CORE] Decrease offheap memory size in resource profile for whole stage fallback case by @PHILO-HE in #8911
- [GLUTEN-9205][CH] Support deletion vector native write by @loneylee in #9248
- [VL] Delete global reference to a class object in JNI unload by @PHILO-HE in #9268
- [GLUTEN-9245][VL] Fix partial project expression contains subquery by @jinchengchenghh in #9259
- [GLUTEN-9244][CORE] Change the way of passing default timezone to native config by @zml1206 in #9249
- [GLUTEN-8497][VL] Fix columnar batch type mismatch in table cache by @zhztheplayer in #9230
- [VL] Support Spark legacy statistical aggregation function behavior by @NEUpanning in #9181
- [CORE] Remove library unloading API from JniLibLoader as unused by @zhztheplayer in #9277
- [GLUTEN-9237][CH] Fix the nullability missmatch issue for the Nothing type by @lgbo-ustc in #9238
- [VL] Disable FlushableHashAggreagte when aggregates contains sum/avg for floating type by @kecookier in #8986
- [CORE] Refine the test with specified spark version by @yikf in #9274
- [CH] Add a comment to explain why the endpoint uses a single thread by @dcoliversun in #9257
- [GLUTEN-8891][VL] Refine local ssd cache feature by @zhouyuan in #9228
- [GLUTEN-9267][CH] Fix a bug in
EliminateDeduplicateAggregateWithAnyJoinby @lgbo-ustc in #9293 - [VL] Remove param original of ColumnarPartialProjectExec by @zml1206 in #9290
- [GLUTEN-9178][CH] Fix cse in aggregate operator not working by @loneylee in #9301
- [CORE] Post events until both spark ui and gluten ui are enable by @yikf in #9272
- [CORE] Correctly handle driver configurations when
spark.sql.extensionsis explicitly set for GlutenSessionExtensions by @zhztheplayer in #9312 - [GLUTEN-8851][VL] feat: Support cudf by @jinchengchenghh in #9229
- [GLUTEN-9288][VL] Enable array_prepend function for spark 3.5+ by @dcoliversun in #9305
- [GLUTEN-9317][CH]Fix: duplicated column names in shuffle read by @lgbo-ustc in #9318
- [Gluten-9254][CH] Support RDDScanExec by @loneylee in #9270
- [VL] Count total JVM memory as the on-heap portion for the off-heap sizing feature by @zhztheplayer in #9321
- [GLUTEN-9300][DOC] Support replacement expression in gen-function-support-docs by @dcoliversun in #9331
- [GLUTEN-9239][CH] [PART-1] Support Java-17 Rmove
JNI_OnUnloadby @baibaichen in #9275 - [GLUTEN-7652][VL] Support binary as string by @wForget in #9325
- [Gluten-9334][CH] Support delta metadata column
file_pathandrow_indexfor mergetree by @loneylee in #9340 - [GLUTEN-6867][CH] Fix Bug that cann't read file on minio by @baibaichen in #9332
- [VL] Provide a configuration option to completely turn off off-heap memory tracking with Spark memory manager by @zhztheplayer in #9341
- [GLUTEN-9313][VL] ColumnarPartialProject supports built-in but blacklisted function by @WangGuangxin in #9315
- [GLUTEN-8772][CORE] refactor: Refactoring the usage of SubstraitContext#functionMap by @wypb in #8775
- [VL] Move pre-configuration code of dynamic off-heap sizing to its own place by @zhztheplayer in #9336
- [GLUTEN-9163][VL] Use stream de/compressor in sort-based shuffle by @marin-ma in #9278
- [GLUTEN-9287][VL] Enable array_compact function for Spark 3.4+ by @dcoliversun in #9349
- [GLUTEN-9095][UT] Remove Vanilla Spark InternalRow based checkEvaluation by @ArnavBalyan in #9096
- [CORE] Make max broadcast table size configurable by @yaooqinn in #9359
- [CH] Fix build error by @exmy in #9363
- [GLUTEN-9243][VL] Fix cuda docker image by @zhouyuan in #9333
- [GLUTEN-8912][VL] Add Offset support for CollectLimitExec by @ArnavBalyan in #8914
- [GLUTEN-7589][VL] Support date_trunc function by @zml1206 in #7611
- [GLUTEN-9279] Not pulling out expression from PartialMerge aggregate function to avoid invalid reference binding in ProjectExecTransformer by @Z1Wu in #9280
- [Gluten-8792][CH] Support delta project incrementMetric expr by @loneylee in #9353
- [GLUTEN-9034][VL] Add VeloxResizeBatchesExec for Shuffle by @WangGuangxin in #9035
- Fix ColumnarToRowRemovalGuard not able to be copied by @yaooqinn in #9384
- [GLUTEN-8846][CH] [Part 4] Add full-chain UT by @jlfsdtc in #9256
- [VL] Follow up on #9384 to avoid swallowing exceptions in UT by @zhztheplayer in #9393
- [GLUTEN-9163][VL] Separate compression buffer and disk write buffer configuration by @marin-ma in #9356
- [VL][INFRA] Improve build bundle package workflow by @wForget in #9404
- [VL] Refactor WholeStageTransformer to remove some duplicate code by @wyp...
v1.5.0-rc1
What's Changed
- [GLUTEN-8846][CH] [Part 3] Add benchmark for Icerberg Delete by @baibaichen in #9192
- [GLUTEN-9020][CH] Support delta DV BitmapAggregator by @loneylee in #9138
- [GLUTEN-9197][CH] Simplify sum aggregate expression by @taiyang-li in #9198
- [VL] Enable more ut in VeloxTestSettings by @WangGuangxin in #9080
- [GLUTEN-9199][VL] Fix error when creating shuffle file: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments by @zhztheplayer in #9200
- [CORE] Fix duplicate setting for config LEGACY_TIME_PARSER_POLICY by @jinchengchenghh in #9201
- [GLUTEN-9176][CH] Rewrite aggregate if to aggregate with filter clause by @taiyang-li in #9185
- [GLUTEN-8557][CH] Flatten nested
And/Orfor performance optimization by @KevinyhZou in #8558 - Revert "[GLUTEN-9164][CH]Enable row group level bloom filter push down" by @taiyang-li in #9214
- [GLUTEN-9182][VL] Support new s3 configuration in Gluten by @dcoliversun in #9183
- [VL] Celeborn shuffle reader OOM with many empty input stream by @marin-ma in #9221
- [GLUTEN-8821][VL] Update aggregate/generator/window support doc and script by @marin-ma in #8971
- [VL] Change to use Velox's wget_and_untar in setup-centos7.sh by @yaooqinn in #9207
- [GLUTEN-9196][CH] Use wide-table aggregation to eliminate multi-table joins by @lgbo-ustc in #9155
- [GLUTEN-9149][CORE] Remove Spark-specific code from JniLibLoader & JniWorkspace by @shuai-xu in #9150
- [VL][CI] Change to use JDK-17 for Spark 3.3/3.4/3.5 tests by @PHILO-HE in #9209
- [CORE][VL] Hide child nodes from implementations of
OffloadSingleNodeby @zhztheplayer in #9220 - [GLUTEN-9008][VL] Support json_object_keys function by @dcoliversun in #9009
- [GLUTEN-9239][CH] Support JDK17 for the CH backend by @zzcclp in #9242
- [GLUTEN-9152][CORE] Avoid unnecessary serialization of hadoop conf by @zml1206 in #9153
- [GLUTEN-9240][VL] Write NULL value into relation in gluten unit tests by @dcoliversun in #9241
- [VL][CI] bump to use ubuntu-22.04 runner by @zhouyuan in #9262
- [GLUTEN-9177][CH]Fix diff on parse host of url and refactor
SparkParseURLby @KevinyhZou in #9179 - [CORE] Decrease offheap memory size in resource profile for whole stage fallback case by @PHILO-HE in #8911
- [GLUTEN-9205][CH] Support deletion vector native write by @loneylee in #9248
- [VL] Delete global reference to a class object in JNI unload by @PHILO-HE in #9268
- [GLUTEN-9245][VL] Fix partial project expression contains subquery by @jinchengchenghh in #9259
- [GLUTEN-9244][CORE] Change the way of passing default timezone to native config by @zml1206 in #9249
- [GLUTEN-8497][VL] Fix columnar batch type mismatch in table cache by @zhztheplayer in #9230
- [VL] Support Spark legacy statistical aggregation function behavior by @NEUpanning in #9181
- [CORE] Remove library unloading API from JniLibLoader as unused by @zhztheplayer in #9277
- [GLUTEN-9237][CH] Fix the nullability missmatch issue for the Nothing type by @lgbo-ustc in #9238
- [VL] Disable FlushableHashAggreagte when aggregates contains sum/avg for floating type by @kecookier in #8986
- [CORE] Refine the test with specified spark version by @yikf in #9274
- [CH] Add a comment to explain why the endpoint uses a single thread by @dcoliversun in #9257
- [GLUTEN-8891][VL] Refine local ssd cache feature by @zhouyuan in #9228
- [GLUTEN-9267][CH] Fix a bug in
EliminateDeduplicateAggregateWithAnyJoinby @lgbo-ustc in #9293 - [VL] Remove param original of ColumnarPartialProjectExec by @zml1206 in #9290
- [GLUTEN-9178][CH] Fix cse in aggregate operator not working by @loneylee in #9301
- [CORE] Post events until both spark ui and gluten ui are enable by @yikf in #9272
- [CORE] Correctly handle driver configurations when
spark.sql.extensionsis explicitly set for GlutenSessionExtensions by @zhztheplayer in #9312 - [GLUTEN-8851][VL] feat: Support cudf by @jinchengchenghh in #9229
- [GLUTEN-9288][VL] Enable array_prepend function for spark 3.5+ by @dcoliversun in #9305
- [GLUTEN-9317][CH]Fix: duplicated column names in shuffle read by @lgbo-ustc in #9318
- [Gluten-9254][CH] Support RDDScanExec by @loneylee in #9270
- [VL] Count total JVM memory as the on-heap portion for the off-heap sizing feature by @zhztheplayer in #9321
- [GLUTEN-9300][DOC] Support replacement expression in gen-function-support-docs by @dcoliversun in #9331
- [GLUTEN-9239][CH] [PART-1] Support Java-17 Rmove
JNI_OnUnloadby @baibaichen in #9275 - [GLUTEN-7652][VL] Support binary as string by @wForget in #9325
- [Gluten-9334][CH] Support delta metadata column
file_pathandrow_indexfor mergetree by @loneylee in #9340 - [GLUTEN-6867][CH] Fix Bug that cann't read file on minio by @baibaichen in #9332
- [VL] Provide a configuration option to completely turn off off-heap memory tracking with Spark memory manager by @zhztheplayer in #9341
- [GLUTEN-9313][VL] ColumnarPartialProject supports built-in but blacklisted function by @WangGuangxin in #9315
- [GLUTEN-8772][CORE] refactor: Refactoring the usage of SubstraitContext#functionMap by @wypb in #8775
- [VL] Move pre-configuration code of dynamic off-heap sizing to its own place by @zhztheplayer in #9336
- [GLUTEN-9163][VL] Use stream de/compressor in sort-based shuffle by @marin-ma in #9278
- [GLUTEN-9287][VL] Enable array_compact function for Spark 3.4+ by @dcoliversun in #9349
- [GLUTEN-9095][UT] Remove Vanilla Spark InternalRow based checkEvaluation by @ArnavBalyan in #9096
- [CORE] Make max broadcast table size configurable by @yaooqinn in #9359
- [CH] Fix build error by @exmy in #9363
- [GLUTEN-9243][VL] Fix cuda docker image by @zhouyuan in #9333
- [GLUTEN-8912][VL] Add Offset support for CollectLimitExec by @ArnavBalyan in #8914
- [GLUTEN-7589][VL] Support date_trunc function by @zml1206 in #7611
- [GLUTEN-9279] Not pulling out expression from PartialMerge aggregate function to avoid invalid reference binding in ProjectExecTransformer by @Z1Wu in #9280
- [Gluten-8792][CH] Support delta project incrementMetric expr by @loneylee in #9353
- [GLUTEN-9034][VL] Add VeloxResizeBatchesExec for Shuffle by @WangGuangxin in #9035
- Fix ColumnarToRowRemovalGuard not able to be copied by @yaooqinn in #9384
- [GLUTEN-8846][CH] [Part 4] Add full-chain UT by @jlfsdtc in #9256
- [VL] Follow up on #9384 to avoid swallowing exceptions in UT by @zhztheplayer in #9393
- [GLUTEN-9163][VL] Separate compression buffer and disk write buffer configuration by @marin-ma in #9356
- [VL][INFRA] Improve build bundle package workflow by @wForget in #9404
- [VL] Refactor WholeStageTransformer to remove some duplicate code by @wyp...
v1.5.0-rc0
What's Changed
- [GLUTEN-8846][CH] [Part 3] Add benchmark for Icerberg Delete by @baibaichen in #9192
- [GLUTEN-9020][CH] Support delta DV BitmapAggregator by @loneylee in #9138
- [GLUTEN-9197][CH] Simplify sum aggregate expression by @taiyang-li in #9198
- [VL] Enable more ut in VeloxTestSettings by @WangGuangxin in #9080
- [GLUTEN-9199][VL] Fix error when creating shuffle file: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments by @zhztheplayer in #9200
- [CORE] Fix duplicate setting for config LEGACY_TIME_PARSER_POLICY by @jinchengchenghh in #9201
- [GLUTEN-9176][CH] Rewrite aggregate if to aggregate with filter clause by @taiyang-li in #9185
- [GLUTEN-8557][CH] Flatten nested
And/Orfor performance optimization by @KevinyhZou in #8558 - Revert "[GLUTEN-9164][CH]Enable row group level bloom filter push down" by @taiyang-li in #9214
- [GLUTEN-9182][VL] Support new s3 configuration in Gluten by @dcoliversun in #9183
- [VL] Celeborn shuffle reader OOM with many empty input stream by @marin-ma in #9221
- [GLUTEN-8821][VL] Update aggregate/generator/window support doc and script by @marin-ma in #8971
- [VL] Change to use Velox's wget_and_untar in setup-centos7.sh by @yaooqinn in #9207
- [GLUTEN-9196][CH] Use wide-table aggregation to eliminate multi-table joins by @lgbo-ustc in #9155
- [GLUTEN-9149][CORE] Remove Spark-specific code from JniLibLoader & JniWorkspace by @shuai-xu in #9150
- [VL][CI] Change to use JDK-17 for Spark 3.3/3.4/3.5 tests by @PHILO-HE in #9209
- [CORE][VL] Hide child nodes from implementations of
OffloadSingleNodeby @zhztheplayer in #9220 - [GLUTEN-9008][VL] Support json_object_keys function by @dcoliversun in #9009
- [GLUTEN-9239][CH] Support JDK17 for the CH backend by @zzcclp in #9242
- [GLUTEN-9152][CORE] Avoid unnecessary serialization of hadoop conf by @zml1206 in #9153
- [GLUTEN-9240][VL] Write NULL value into relation in gluten unit tests by @dcoliversun in #9241
- [VL][CI] bump to use ubuntu-22.04 runner by @zhouyuan in #9262
- [GLUTEN-9177][CH]Fix diff on parse host of url and refactor
SparkParseURLby @KevinyhZou in #9179 - [CORE] Decrease offheap memory size in resource profile for whole stage fallback case by @PHILO-HE in #8911
- [GLUTEN-9205][CH] Support deletion vector native write by @loneylee in #9248
- [VL] Delete global reference to a class object in JNI unload by @PHILO-HE in #9268
- [GLUTEN-9245][VL] Fix partial project expression contains subquery by @jinchengchenghh in #9259
- [GLUTEN-9244][CORE] Change the way of passing default timezone to native config by @zml1206 in #9249
- [GLUTEN-8497][VL] Fix columnar batch type mismatch in table cache by @zhztheplayer in #9230
- [VL] Support Spark legacy statistical aggregation function behavior by @NEUpanning in #9181
- [CORE] Remove library unloading API from JniLibLoader as unused by @zhztheplayer in #9277
- [GLUTEN-9237][CH] Fix the nullability missmatch issue for the Nothing type by @lgbo-ustc in #9238
- [VL] Disable FlushableHashAggreagte when aggregates contains sum/avg for floating type by @kecookier in #8986
- [CORE] Refine the test with specified spark version by @yikf in #9274
- [CH] Add a comment to explain why the endpoint uses a single thread by @dcoliversun in #9257
- [GLUTEN-8891][VL] Refine local ssd cache feature by @zhouyuan in #9228
- [GLUTEN-9267][CH] Fix a bug in
EliminateDeduplicateAggregateWithAnyJoinby @lgbo-ustc in #9293 - [VL] Remove param original of ColumnarPartialProjectExec by @zml1206 in #9290
- [GLUTEN-9178][CH] Fix cse in aggregate operator not working by @loneylee in #9301
- [CORE] Post events until both spark ui and gluten ui are enable by @yikf in #9272
- [CORE] Correctly handle driver configurations when
spark.sql.extensionsis explicitly set for GlutenSessionExtensions by @zhztheplayer in #9312 - [GLUTEN-8851][VL] feat: Support cudf by @jinchengchenghh in #9229
- [GLUTEN-9288][VL] Enable array_prepend function for spark 3.5+ by @dcoliversun in #9305
- [GLUTEN-9317][CH]Fix: duplicated column names in shuffle read by @lgbo-ustc in #9318
- [Gluten-9254][CH] Support RDDScanExec by @loneylee in #9270
- [VL] Count total JVM memory as the on-heap portion for the off-heap sizing feature by @zhztheplayer in #9321
- [GLUTEN-9300][DOC] Support replacement expression in gen-function-support-docs by @dcoliversun in #9331
- [GLUTEN-9239][CH] [PART-1] Support Java-17 Rmove
JNI_OnUnloadby @baibaichen in #9275 - [GLUTEN-7652][VL] Support binary as string by @wForget in #9325
- [Gluten-9334][CH] Support delta metadata column
file_pathandrow_indexfor mergetree by @loneylee in #9340 - [GLUTEN-6867][CH] Fix Bug that cann't read file on minio by @baibaichen in #9332
- [VL] Provide a configuration option to completely turn off off-heap memory tracking with Spark memory manager by @zhztheplayer in #9341
- [GLUTEN-9313][VL] ColumnarPartialProject supports built-in but blacklisted function by @WangGuangxin in #9315
- [GLUTEN-8772][CORE] refactor: Refactoring the usage of SubstraitContext#functionMap by @wypb in #8775
- [VL] Move pre-configuration code of dynamic off-heap sizing to its own place by @zhztheplayer in #9336
- [GLUTEN-9163][VL] Use stream de/compressor in sort-based shuffle by @marin-ma in #9278
- [GLUTEN-9287][VL] Enable array_compact function for Spark 3.4+ by @dcoliversun in #9349
- [GLUTEN-9095][UT] Remove Vanilla Spark InternalRow based checkEvaluation by @ArnavBalyan in #9096
- [CORE] Make max broadcast table size configurable by @yaooqinn in #9359
- [CH] Fix build error by @exmy in #9363
- [GLUTEN-9243][VL] Fix cuda docker image by @zhouyuan in #9333
- [GLUTEN-8912][VL] Add Offset support for CollectLimitExec by @ArnavBalyan in #8914
- [GLUTEN-7589][VL] Support date_trunc function by @zml1206 in #7611
- [GLUTEN-9279] Not pulling out expression from PartialMerge aggregate function to avoid invalid reference binding in ProjectExecTransformer by @Z1Wu in #9280
- [Gluten-8792][CH] Support delta project incrementMetric expr by @loneylee in #9353
- [GLUTEN-9034][VL] Add VeloxResizeBatchesExec for Shuffle by @WangGuangxin in #9035
- Fix ColumnarToRowRemovalGuard not able to be copied by @yaooqinn in #9384
- [GLUTEN-8846][CH] [Part 4] Add full-chain UT by @jlfsdtc in #9256
- [VL] Follow up on #9384 to avoid swallowing exceptions in UT by @zhztheplayer in #9393
- [GLUTEN-9163][VL] Separate compression buffer and disk write buffer configuration by @marin-ma in #9356
- [VL][INFRA] Improve build bundle package workflow by @wForget in #9404
- [VL] Refactor WholeStageTransformer to remove some duplicate code by @wyp...
v1.4.0
Release Notes - Gluten version 1.4.0
Highlights
- Spark 3.2.2/3.3.1/3.4.4(upgraded)/3.5.2
- Add more spark functions support including date_format, make_date, map_filter, map_concat, from_json, btrim, array_append, and more
- Add more spark operators support including Range, CollectLimit, and more
- Update OAP's Velox codebase to 2025/05/12
- Join optimizations: BNLJ full outer join
- Shuffle optimizations: RSS ShuffleReader optimization and bug fixing
- RSS: Celeborn 0.5.4(upgraded)/Uniffle 0.9.2(upgraded)
- Query Plan: RAS cost model optimizations and refactor
- Datalake: Add Iceberg/Hudi in test
- CI: Docker image and JDK version update
- Support dynamically adjust Stage Resource Profile
- Support Query Trace
- Add Qualification Tool
- Fix OOM issues for some untracked memory
What's Changed
- [GLUTEN-8327][CORE][Part-3] Introduce the
ConfigEntryto gluten config by @yikf in #8431 - [VL] Fix wrong warning of "Memory overhead is set to ..." under default Spark config settings by @zhztheplayer in #8448
- [GLUTEN-8385][VL] Support write compatible-hive bucket table for Spark3.4 and Spark3.5. by @yikf in #8386
- Revert "[CH] Disable gluten arm ci" by @lwz9103 in #8460
- [GLUTEN-8453] [VL] Allow Heavy Batch to be Processed by ColumnarCachedBatchSerializer by @ArnavBalyan in #8454
- [CH] Add tools to dump ActionsDAG into tree graph by @taiyang-li in #8461
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_08) by @GlutenPerfBot in #8457
- [VL] Update document of build gluten in Docker by @FelixYBW in #8459
- [GLUTEN-8462][CORE] Raise a meaningful error when no component is found from classpath by @zhztheplayer in #8468
- [GLUTEN-8453][VL] Follow-up to #8454 to add a
ensureVeloxBatchAPI for limited use cases by @zhztheplayer in #8463 - [VL] Refactor Velox.md by @FelixYBW in #8478
- [GLUTEN-8465] [VL] Bump Celeborn to 0.5.3 by @SteNicholas in #8467
- [GLUTEN-8455][VL] Fallback Scan for Encrypted Parquet Files by @ArnavBalyan in #8456
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_09) by @GlutenPerfBot in #8472
- [CORE] Refactor columnar noop write rule by @jackylee-ch in #8422
- [GLUTEN-8462][CH] Fixed the loading of Components and Backend by @gleonSun in #8464
- [GLUTEN-8414][VL] Override doCanonicalize in ColumnarPartialProjectEx… by @lifulong in #8415
- [GLUTEN-8397][CH][Part-2] Fix statica_cast failed on macos by @yxheartipp in #8485
- [GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it by @KevinyhZou in #8351
- [GLUTEN-8481][VL] Clean up shuffle reader cpp code by @marin-ma in #8482
- [Core] Bump version to 1.4.0-SNAPSHOT by @weiting-chen in #8452
- [GLUTEN-8483][CORE] A stable and universal way to find component files by @zhztheplayer in #8486
- [DOC][VL] Fix typo in microbenchmark.md by @marin-ma in #8495
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250110) by @kyligence-git in #8490
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_10) by @GlutenPerfBot in #8489
- [GLUTEN-8476][VL] Fix allocate and free memory by @jkhaliqi in #8477
- [GLUTEN-8503][VL] Fix macro parenthesis CVE by @jkhaliqi in #8504
- [GLUTEN-8471][VL] Fix usage of uninitialized variables by @jkhaliqi in #8470
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_11) by @GlutenPerfBot in #8507
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_12) by @GlutenPerfBot in #8508
- [GLUTEN-8497][VL] A bad test case that fails columnar table cache query by @zhztheplayer in #8498
- [DOC] Update README.md by @PHILO-HE in #8444
- [GLUTEN-8319][VL] Support date_format Spark function by @PHILO-HE in #8323
- [GLUTEN-8487][VL] adding JDK11 based Centos8 image by @zhouyuan in #8513
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_14) by @GlutenPerfBot in #8522
- [GLUTEN-8020][VL] Remove the libhdfs3 installation script required for static linking by @JkSelf in #8013
- [GLUTEN-8532][VL] Fix parenthesis within macro by @jkhaliqi in #8533
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_15) by @GlutenPerfBot in #8536
- [CORE] Use RAS's cost model for legacy transition planner to evaluate cost of transitions by @zhztheplayer in #8527
- [GLUTEN-8487][VL] adding JDK17 based Centos8 image (#8513) by @zhouyuan in #8539
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250115) by @kyligence-git in #8537
- [GLUTEN-8479][CORE][Part-1] Remove unnecessary config by @yikf in #8480
- [GLUTEN-8520][VL] Fix bitwise operators by @jkhaliqi in #8521
- [GLUTEN-8524][VL] Fix input output errors by @jkhaliqi in #8525
- [GLUTEN-6876][VL] update spark 3.5.2 in doc by @FelixYBW in #8543
- [GLUTEN-8455][VL] Port encrypted file checks to shim layer by @ArnavBalyan in #8501
- [CORE][VL] Cost model code refactors by @zhztheplayer in #8541
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_16) by @GlutenPerfBot in #8546
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250116) by @kyligence-git in #8544
- [GLUTEN-8432][CH]Remove duplicate output attributes of aggregate's child by @lgbo-ustc in #8450
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_17) by @GlutenPerfBot in #8553
- [GLUTEN-8497][CORE] A unified CallInfo API to replace AdaptiveContext by @zhztheplayer in #8551
- [GLUTEN-8529][CH]Fix get_json_object when path has asterisk by @KevinyhZou in #8540
- [MINOR] Fix comment of function VeloxAggregateFunctionsBuilder.create by @zml1206 in #8549
- [CORE] Optimize duplicated code for create rel node by @zml1206 in #8548
- [GLUTEN-7706][CORE] Support Spark-344 + JDK17 by @zhouyuan in #7789
- [GLUTEN-8475][VL] Fix C-style casts to C++-style by @jkhaliqi in #8474
- [GLUTEN-8534][VL] Fix allowing loops to iterate beyond end of array by @jkhaliqi in #8535
- [GLUTEN-8538][VL] Fix incorrect calculation of buffer size by @jkhaliqi in #8542
- [CORE][CH] Support MicroBatchScanExec with KafkaScan in batch mode by @loneylee in #8321
- [CORE][MIRROR] Change config.defaultValue.get.toString to config.defaultValueString by @jackylee-ch in #8572
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_18) by @GlutenPerfBot in #8561
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_19) by @GlutenPerfBot in #8563
- [GLUTEN-8406][CH] Replace
from_json(s, 'Map<String, String>')[k]withget_json_object(s, '$.k')by @lgbo-ustc in #8409 - [GLUTEN-8479][CORE][Part-2] All configurations should be defined through ConfigEntry by @yikf in #8559
- [VL] CMake configuration cleanup to remove variable VELOX_COMPONENTS_PATH by @zhztheplayer in #8579
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250121) by @kyligence-git in #8577
- [DOC] Fix outdated operators in documentation by @ArnavBalyan in #8582
- [GLUTEN-8379][VL] Support query trace by @jinchengchenghh in https://gi...
v1.4.0-rc2
What's Changed
- [GLUTEN-8327][CORE][Part-3] Introduce the
ConfigEntryto gluten config by @yikf in #8431 - [VL] Fix wrong warning of "Memory overhead is set to ..." under default Spark config settings by @zhztheplayer in #8448
- [GLUTEN-8385][VL] Support write compatible-hive bucket table for Spark3.4 and Spark3.5. by @yikf in #8386
- Revert "[CH] Disable gluten arm ci" by @lwz9103 in #8460
- [GLUTEN-8453] [VL] Allow Heavy Batch to be Processed by ColumnarCachedBatchSerializer by @ArnavBalyan in #8454
- [CH] Add tools to dump ActionsDAG into tree graph by @taiyang-li in #8461
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_08) by @GlutenPerfBot in #8457
- [VL] Update document of build gluten in Docker by @FelixYBW in #8459
- [GLUTEN-8462][CORE] Raise a meaningful error when no component is found from classpath by @zhztheplayer in #8468
- [GLUTEN-8453][VL] Follow-up to #8454 to add a
ensureVeloxBatchAPI for limited use cases by @zhztheplayer in #8463 - [VL] Refactor Velox.md by @FelixYBW in #8478
- [GLUTEN-8465] [VL] Bump Celeborn to 0.5.3 by @SteNicholas in #8467
- [GLUTEN-8455][VL] Fallback Scan for Encrypted Parquet Files by @ArnavBalyan in #8456
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_09) by @GlutenPerfBot in #8472
- [CORE] Refactor columnar noop write rule by @jackylee-ch in #8422
- [GLUTEN-8462][CH] Fixed the loading of Components and Backend by @gleonSun in #8464
- [GLUTEN-8414][VL] Override doCanonicalize in ColumnarPartialProjectEx… by @lifulong in #8415
- [GLUTEN-8397][CH][Part-2] Fix statica_cast failed on macos by @yxheartipp in #8485
- [GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it by @KevinyhZou in #8351
- [GLUTEN-8481][VL] Clean up shuffle reader cpp code by @marin-ma in #8482
- [Core] Bump version to 1.4.0-SNAPSHOT by @weiting-chen in #8452
- [GLUTEN-8483][CORE] A stable and universal way to find component files by @zhztheplayer in #8486
- [DOC][VL] Fix typo in microbenchmark.md by @marin-ma in #8495
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250110) by @kyligence-git in #8490
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_10) by @GlutenPerfBot in #8489
- [GLUTEN-8476][VL] Fix allocate and free memory by @jkhaliqi in #8477
- [GLUTEN-8503][VL] Fix macro parenthesis CVE by @jkhaliqi in #8504
- [GLUTEN-8471][VL] Fix usage of uninitialized variables by @jkhaliqi in #8470
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_11) by @GlutenPerfBot in #8507
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_12) by @GlutenPerfBot in #8508
- [GLUTEN-8497][VL] A bad test case that fails columnar table cache query by @zhztheplayer in #8498
- [DOC] Update README.md by @PHILO-HE in #8444
- [GLUTEN-8319][VL] Support date_format Spark function by @PHILO-HE in #8323
- [GLUTEN-8487][VL] adding JDK11 based Centos8 image by @zhouyuan in #8513
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_14) by @GlutenPerfBot in #8522
- [GLUTEN-8020][VL] Remove the libhdfs3 installation script required for static linking by @JkSelf in #8013
- [GLUTEN-8532][VL] Fix parenthesis within macro by @jkhaliqi in #8533
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_15) by @GlutenPerfBot in #8536
- [CORE] Use RAS's cost model for legacy transition planner to evaluate cost of transitions by @zhztheplayer in #8527
- [GLUTEN-8487][VL] adding JDK17 based Centos8 image (#8513) by @zhouyuan in #8539
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250115) by @kyligence-git in #8537
- [GLUTEN-8479][CORE][Part-1] Remove unnecessary config by @yikf in #8480
- [GLUTEN-8520][VL] Fix bitwise operators by @jkhaliqi in #8521
- [GLUTEN-8524][VL] Fix input output errors by @jkhaliqi in #8525
- [GLUTEN-6876][VL] update spark 3.5.2 in doc by @FelixYBW in #8543
- [GLUTEN-8455][VL] Port encrypted file checks to shim layer by @ArnavBalyan in #8501
- [CORE][VL] Cost model code refactors by @zhztheplayer in #8541
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_16) by @GlutenPerfBot in #8546
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250116) by @kyligence-git in #8544
- [GLUTEN-8432][CH]Remove duplicate output attributes of aggregate's child by @lgbo-ustc in #8450
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_17) by @GlutenPerfBot in #8553
- [GLUTEN-8497][CORE] A unified CallInfo API to replace AdaptiveContext by @zhztheplayer in #8551
- [GLUTEN-8529][CH]Fix get_json_object when path has asterisk by @KevinyhZou in #8540
- [MINOR] Fix comment of function VeloxAggregateFunctionsBuilder.create by @zml1206 in #8549
- [CORE] Optimize duplicated code for create rel node by @zml1206 in #8548
- [GLUTEN-7706][CORE] Support Spark-344 + JDK17 by @zhouyuan in #7789
- [GLUTEN-8475][VL] Fix C-style casts to C++-style by @jkhaliqi in #8474
- [GLUTEN-8534][VL] Fix allowing loops to iterate beyond end of array by @jkhaliqi in #8535
- [GLUTEN-8538][VL] Fix incorrect calculation of buffer size by @jkhaliqi in #8542
- [CORE][CH] Support MicroBatchScanExec with KafkaScan in batch mode by @loneylee in #8321
- [CORE][MIRROR] Change config.defaultValue.get.toString to config.defaultValueString by @jackylee-ch in #8572
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_18) by @GlutenPerfBot in #8561
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_19) by @GlutenPerfBot in #8563
- [GLUTEN-8406][CH] Replace
from_json(s, 'Map<String, String>')[k]withget_json_object(s, '$.k')by @lgbo-ustc in #8409 - [GLUTEN-8479][CORE][Part-2] All configurations should be defined through ConfigEntry by @yikf in #8559
- [VL] CMake configuration cleanup to remove variable VELOX_COMPONENTS_PATH by @zhztheplayer in #8579
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250121) by @kyligence-git in #8577
- [DOC] Fix outdated operators in documentation by @ArnavBalyan in #8582
- [GLUTEN-8379][VL] Support query trace by @jinchengchenghh in #8380
- [GLUTEN-8266][VL][CI] Pre-install uniffle in docker image by @zhouyuan in #8578
- [VL] Update the Scaladoc of Component API by @zhztheplayer in #8589
- [GLUTEN-8455][VL] Support encrypted parquet fallback for 3.5 by @ArnavBalyan in #8560
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_22) by @GlutenPerfBot in #8587
- [GLUTEN-8580][CORE][Part-1] Clean up unnecessary code related to input file expression by @zml1206 in #8584
- [GLUTEN-8379][VL] Fix typo in query trace document by @jinchengchenghh in https://github.com...
v1.4.0-rc1
What's Changed
- [GLUTEN-8327][CORE][Part-3] Introduce the
ConfigEntryto gluten config by @yikf in #8431 - [VL] Fix wrong warning of "Memory overhead is set to ..." under default Spark config settings by @zhztheplayer in #8448
- [GLUTEN-8385][VL] Support write compatible-hive bucket table for Spark3.4 and Spark3.5. by @yikf in #8386
- Revert "[CH] Disable gluten arm ci" by @lwz9103 in #8460
- [GLUTEN-8453] [VL] Allow Heavy Batch to be Processed by ColumnarCachedBatchSerializer by @ArnavBalyan in #8454
- [CH] Add tools to dump ActionsDAG into tree graph by @taiyang-li in #8461
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_08) by @GlutenPerfBot in #8457
- [VL] Update document of build gluten in Docker by @FelixYBW in #8459
- [GLUTEN-8462][CORE] Raise a meaningful error when no component is found from classpath by @zhztheplayer in #8468
- [GLUTEN-8453][VL] Follow-up to #8454 to add a
ensureVeloxBatchAPI for limited use cases by @zhztheplayer in #8463 - [VL] Refactor Velox.md by @FelixYBW in #8478
- [GLUTEN-8465] [VL] Bump Celeborn to 0.5.3 by @SteNicholas in #8467
- [GLUTEN-8455][VL] Fallback Scan for Encrypted Parquet Files by @ArnavBalyan in #8456
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_09) by @GlutenPerfBot in #8472
- [CORE] Refactor columnar noop write rule by @jackylee-ch in #8422
- [GLUTEN-8462][CH] Fixed the loading of Components and Backend by @gleonSun in #8464
- [GLUTEN-8414][VL] Override doCanonicalize in ColumnarPartialProjectEx… by @lifulong in #8415
- [GLUTEN-8397][CH][Part-2] Fix statica_cast failed on macos by @yxheartipp in #8485
- [GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it by @KevinyhZou in #8351
- [GLUTEN-8481][VL] Clean up shuffle reader cpp code by @marin-ma in #8482
- [Core] Bump version to 1.4.0-SNAPSHOT by @weiting-chen in #8452
- [GLUTEN-8483][CORE] A stable and universal way to find component files by @zhztheplayer in #8486
- [DOC][VL] Fix typo in microbenchmark.md by @marin-ma in #8495
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250110) by @kyligence-git in #8490
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_10) by @GlutenPerfBot in #8489
- [GLUTEN-8476][VL] Fix allocate and free memory by @jkhaliqi in #8477
- [GLUTEN-8503][VL] Fix macro parenthesis CVE by @jkhaliqi in #8504
- [GLUTEN-8471][VL] Fix usage of uninitialized variables by @jkhaliqi in #8470
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_11) by @GlutenPerfBot in #8507
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_12) by @GlutenPerfBot in #8508
- [GLUTEN-8497][VL] A bad test case that fails columnar table cache query by @zhztheplayer in #8498
- [DOC] Update README.md by @PHILO-HE in #8444
- [GLUTEN-8319][VL] Support date_format Spark function by @PHILO-HE in #8323
- [GLUTEN-8487][VL] adding JDK11 based Centos8 image by @zhouyuan in #8513
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_14) by @GlutenPerfBot in #8522
- [GLUTEN-8020][VL] Remove the libhdfs3 installation script required for static linking by @JkSelf in #8013
- [GLUTEN-8532][VL] Fix parenthesis within macro by @jkhaliqi in #8533
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_15) by @GlutenPerfBot in #8536
- [CORE] Use RAS's cost model for legacy transition planner to evaluate cost of transitions by @zhztheplayer in #8527
- [GLUTEN-8487][VL] adding JDK17 based Centos8 image (#8513) by @zhouyuan in #8539
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250115) by @kyligence-git in #8537
- [GLUTEN-8479][CORE][Part-1] Remove unnecessary config by @yikf in #8480
- [GLUTEN-8520][VL] Fix bitwise operators by @jkhaliqi in #8521
- [GLUTEN-8524][VL] Fix input output errors by @jkhaliqi in #8525
- [GLUTEN-6876][VL] update spark 3.5.2 in doc by @FelixYBW in #8543
- [GLUTEN-8455][VL] Port encrypted file checks to shim layer by @ArnavBalyan in #8501
- [CORE][VL] Cost model code refactors by @zhztheplayer in #8541
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_16) by @GlutenPerfBot in #8546
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250116) by @kyligence-git in #8544
- [GLUTEN-8432][CH]Remove duplicate output attributes of aggregate's child by @lgbo-ustc in #8450
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_17) by @GlutenPerfBot in #8553
- [GLUTEN-8497][CORE] A unified CallInfo API to replace AdaptiveContext by @zhztheplayer in #8551
- [GLUTEN-8529][CH]Fix get_json_object when path has asterisk by @KevinyhZou in #8540
- [MINOR] Fix comment of function VeloxAggregateFunctionsBuilder.create by @zml1206 in #8549
- [CORE] Optimize duplicated code for create rel node by @zml1206 in #8548
- [GLUTEN-7706][CORE] Support Spark-344 + JDK17 by @zhouyuan in #7789
- [GLUTEN-8475][VL] Fix C-style casts to C++-style by @jkhaliqi in #8474
- [GLUTEN-8534][VL] Fix allowing loops to iterate beyond end of array by @jkhaliqi in #8535
- [GLUTEN-8538][VL] Fix incorrect calculation of buffer size by @jkhaliqi in #8542
- [CORE][CH] Support MicroBatchScanExec with KafkaScan in batch mode by @loneylee in #8321
- [CORE][MIRROR] Change config.defaultValue.get.toString to config.defaultValueString by @jackylee-ch in #8572
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_18) by @GlutenPerfBot in #8561
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_19) by @GlutenPerfBot in #8563
- [GLUTEN-8406][CH] Replace
from_json(s, 'Map<String, String>')[k]withget_json_object(s, '$.k')by @lgbo-ustc in #8409 - [GLUTEN-8479][CORE][Part-2] All configurations should be defined through ConfigEntry by @yikf in #8559
- [VL] CMake configuration cleanup to remove variable VELOX_COMPONENTS_PATH by @zhztheplayer in #8579
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250121) by @kyligence-git in #8577
- [DOC] Fix outdated operators in documentation by @ArnavBalyan in #8582
- [GLUTEN-8379][VL] Support query trace by @jinchengchenghh in #8380
- [GLUTEN-8266][VL][CI] Pre-install uniffle in docker image by @zhouyuan in #8578
- [VL] Update the Scaladoc of Component API by @zhztheplayer in #8589
- [GLUTEN-8455][VL] Support encrypted parquet fallback for 3.5 by @ArnavBalyan in #8560
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_22) by @GlutenPerfBot in #8587
- [GLUTEN-8580][CORE][Part-1] Clean up unnecessary code related to input file expression by @zml1206 in #8584
- [GLUTEN-8379][VL] Fix typo in query trace document by @jinchengchenghh in https://github.com...
v1.4.0-rc0
What's Changed
- [GLUTEN-8327][CORE][Part-3] Introduce the
ConfigEntryto gluten config by @yikf in #8431 - [VL] Fix wrong warning of "Memory overhead is set to ..." under default Spark config settings by @zhztheplayer in #8448
- [GLUTEN-8385][VL] Support write compatible-hive bucket table for Spark3.4 and Spark3.5. by @yikf in #8386
- Revert "[CH] Disable gluten arm ci" by @lwz9103 in #8460
- [GLUTEN-8453] [VL] Allow Heavy Batch to be Processed by ColumnarCachedBatchSerializer by @ArnavBalyan in #8454
- [CH] Add tools to dump ActionsDAG into tree graph by @taiyang-li in #8461
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_08) by @GlutenPerfBot in #8457
- [VL] Update document of build gluten in Docker by @FelixYBW in #8459
- [GLUTEN-8462][CORE] Raise a meaningful error when no component is found from classpath by @zhztheplayer in #8468
- [GLUTEN-8453][VL] Follow-up to #8454 to add a
ensureVeloxBatchAPI for limited use cases by @zhztheplayer in #8463 - [VL] Refactor Velox.md by @FelixYBW in #8478
- [GLUTEN-8465] [VL] Bump Celeborn to 0.5.3 by @SteNicholas in #8467
- [GLUTEN-8455][VL] Fallback Scan for Encrypted Parquet Files by @ArnavBalyan in #8456
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_09) by @GlutenPerfBot in #8472
- [CORE] Refactor columnar noop write rule by @jackylee-ch in #8422
- [GLUTEN-8462][CH] Fixed the loading of Components and Backend by @gleonSun in #8464
- [GLUTEN-8414][VL] Override doCanonicalize in ColumnarPartialProjectEx… by @lifulong in #8415
- [GLUTEN-8397][CH][Part-2] Fix statica_cast failed on macos by @yxheartipp in #8485
- [GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it by @KevinyhZou in #8351
- [GLUTEN-8481][VL] Clean up shuffle reader cpp code by @marin-ma in #8482
- [Core] Bump version to 1.4.0-SNAPSHOT by @weiting-chen in #8452
- [GLUTEN-8483][CORE] A stable and universal way to find component files by @zhztheplayer in #8486
- [DOC][VL] Fix typo in microbenchmark.md by @marin-ma in #8495
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250110) by @kyligence-git in #8490
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_10) by @GlutenPerfBot in #8489
- [GLUTEN-8476][VL] Fix allocate and free memory by @jkhaliqi in #8477
- [GLUTEN-8503][VL] Fix macro parenthesis CVE by @jkhaliqi in #8504
- [GLUTEN-8471][VL] Fix usage of uninitialized variables by @jkhaliqi in #8470
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_11) by @GlutenPerfBot in #8507
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_12) by @GlutenPerfBot in #8508
- [GLUTEN-8497][VL] A bad test case that fails columnar table cache query by @zhztheplayer in #8498
- [DOC] Update README.md by @PHILO-HE in #8444
- [GLUTEN-8319][VL] Support date_format Spark function by @PHILO-HE in #8323
- [GLUTEN-8487][VL] adding JDK11 based Centos8 image by @zhouyuan in #8513
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_14) by @GlutenPerfBot in #8522
- [GLUTEN-8020][VL] Remove the libhdfs3 installation script required for static linking by @JkSelf in #8013
- [GLUTEN-8532][VL] Fix parenthesis within macro by @jkhaliqi in #8533
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_15) by @GlutenPerfBot in #8536
- [CORE] Use RAS's cost model for legacy transition planner to evaluate cost of transitions by @zhztheplayer in #8527
- [GLUTEN-8487][VL] adding JDK17 based Centos8 image (#8513) by @zhouyuan in #8539
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250115) by @kyligence-git in #8537
- [GLUTEN-8479][CORE][Part-1] Remove unnecessary config by @yikf in #8480
- [GLUTEN-8520][VL] Fix bitwise operators by @jkhaliqi in #8521
- [GLUTEN-8524][VL] Fix input output errors by @jkhaliqi in #8525
- [GLUTEN-6876][VL] update spark 3.5.2 in doc by @FelixYBW in #8543
- [GLUTEN-8455][VL] Port encrypted file checks to shim layer by @ArnavBalyan in #8501
- [CORE][VL] Cost model code refactors by @zhztheplayer in #8541
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_16) by @GlutenPerfBot in #8546
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250116) by @kyligence-git in #8544
- [GLUTEN-8432][CH]Remove duplicate output attributes of aggregate's child by @lgbo-ustc in #8450
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_17) by @GlutenPerfBot in #8553
- [GLUTEN-8497][CORE] A unified CallInfo API to replace AdaptiveContext by @zhztheplayer in #8551
- [GLUTEN-8529][CH]Fix get_json_object when path has asterisk by @KevinyhZou in #8540
- [MINOR] Fix comment of function VeloxAggregateFunctionsBuilder.create by @zml1206 in #8549
- [CORE] Optimize duplicated code for create rel node by @zml1206 in #8548
- [GLUTEN-7706][CORE] Support Spark-344 + JDK17 by @zhouyuan in #7789
- [GLUTEN-8475][VL] Fix C-style casts to C++-style by @jkhaliqi in #8474
- [GLUTEN-8534][VL] Fix allowing loops to iterate beyond end of array by @jkhaliqi in #8535
- [GLUTEN-8538][VL] Fix incorrect calculation of buffer size by @jkhaliqi in #8542
- [CORE][CH] Support MicroBatchScanExec with KafkaScan in batch mode by @loneylee in #8321
- [CORE][MIRROR] Change config.defaultValue.get.toString to config.defaultValueString by @jackylee-ch in #8572
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_18) by @GlutenPerfBot in #8561
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_19) by @GlutenPerfBot in #8563
- [GLUTEN-8406][CH] Replace
from_json(s, 'Map<String, String>')[k]withget_json_object(s, '$.k')by @lgbo-ustc in #8409 - [GLUTEN-8479][CORE][Part-2] All configurations should be defined through ConfigEntry by @yikf in #8559
- [VL] CMake configuration cleanup to remove variable VELOX_COMPONENTS_PATH by @zhztheplayer in #8579
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250121) by @kyligence-git in #8577
- [DOC] Fix outdated operators in documentation by @ArnavBalyan in #8582
- [GLUTEN-8379][VL] Support query trace by @jinchengchenghh in #8380
- [GLUTEN-8266][VL][CI] Pre-install uniffle in docker image by @zhouyuan in #8578
- [VL] Update the Scaladoc of Component API by @zhztheplayer in #8589
- [GLUTEN-8455][VL] Support encrypted parquet fallback for 3.5 by @ArnavBalyan in #8560
- [GLUTEN-6887][VL] Daily Update Velox Version (2025_01_22) by @GlutenPerfBot in #8587
- [GLUTEN-8580][CORE][Part-1] Clean up unnecessary code related to input file expression by @zml1206 in #8584
- [GLUTEN-8379][VL] Fix typo in query trace document by @jinchengchenghh in https://github.com...
v1.3.0
Release Notes - Gluten version 1.3.0
Highlights
- Spark 3.2.2/3.3.1/3.4.3(upgraded)/3.5.2(upgraded)
- 268+ spark functions including json
- Update OAP's Velox codebase to 2025/01/07
- Join: Sort Merge Join support
- Shuffle: Sort based Shuffle(Row)
- Query Plan: RAS Optimization
- Datalake: Hudi 0.15.0 support/Iceberg 1.5.0/Delta 3.2.0
- RSS: Celeborn 0.5.2/Uniffle 0.9.1
- File Format: CSV support via arrow
- JVM libhdfs with viewfs/kerberos support
- Partial Project(UDF) support
- Mix backend refactor
- Bucket write in partitioned Hive table
- CI/Nightly Package Tools Update
- Build & Compile Tools Update(recommend to use vcpkg with static build)
- Fix several result mismatch issues
- Fix OOM/Yarn Kill unstable issues
What's Changed
- [VL] Make velox writer queue size configurable @yikf github.com//pull/6341
- [VL] Remove useless ctx variable @gaoyangxiaozhu github.com//pull/6348
- [1632][CH]Daily Update Clickhouse Version (20240706) @kyligence-git github.com//pull/6359
- [VL] fix build bundle package @zhouyuan github.com//pull/6364
- [VL] Fix process_setup_alinux3 arrow CMakeLists.txt path @liujiayi771 github.com//pull/6363
- [VL] Daily Update Velox Version (2024_07_08) @GlutenPerfBot github.com//pull/6366
- [6262][CH]Json input format ignore key case @KevinyhZou github.com//pull/6263
- [6285][VL] Add debian10 vcpkg depends @wenwj0 github.com//pull/6286
- [CELEBORN] CelebornShuffleManager#stop should stop non-null _vanillaCelebornShuffleManager @SteNicholas github.com//pull/6371
- [VL] Update ubuntu docker to use cmake 3.28 @boneanxs github.com//pull/6373
- [6304][CH]Support array_join @KevinyhZou github.com//pull/6305
- [VL] Daily Update Velox Version (2024_07_09) @GlutenPerfBot github.com//pull/6376
- [6378][CH] Support delta count optimizer for the MergeTree format @zzcclp github.com//pull/6379
- [6345][CH] Deprecate SCALAR_FUNCTIONS SerializedPlanParser @lgbo-ustc github.com//pull/6347
- [TEST] Use project version rather than Gluten version Gluten it @ulysses-you github.com//pull/6385
- [6377][CH] Support window function
percent_rank@lgbo-ustc github.com//pull/6386 - [VL] Minor refactor for ValueStream node construction and usage @Yohahaha github.com//pull/6382
- [VL] Enable levenshtein function @zhli1142015 github.com//pull/6389
- [VL] Daily Update Velox Version (2024_07_10) @GlutenPerfBot github.com//pull/6384
- [1632][CH]Daily Update Clickhouse Version (20240710) @kyligence-git github.com//pull/6383
- Test input_file_name, input_file_block_start & input_file_block_length when scan falls back @gaoyangxiaozhu github.com//pull/6318
- [6394][VL] Fix the vcpkg package script @weixiuli github.com//pull/6395
- [6288][CH] Support BroadcastNestedLoopJoinExe[Part one] @loneylee github.com//pull/6290
- [CELEBORN] Rename CelebornHashBasedColumnarShuffleWriter to CelebornColumnarShuffleWriter @kerwin-zk github.com//pull/6391
- [VL] Fix E function fallback issue some condition @gaoyangxiaozhu github.com//pull/6397
- [CI] Fix centos7 failure @marin-ma github.com//pull/6404
- [1632][CH]Daily Update Clickhouse Version (20240711) @kyligence-git github.com//pull/6399
- [CELEBORN] Add compression for row-based shuffle @kerwin-zk github.com//pull/6380
- [VL] Daily Update Velox Version (2024_07_11) @GlutenPerfBot github.com//pull/6400
- [CORE] Remove local sort for TopNRowNumber @ulysses-you github.com//pull/6381
- [VL] Spark assert_true function support @gaoyangxiaozhu github.com//pull/6329
- [VL] Add schema validation for all operators @zhli1142015 github.com//pull/6406
- [CORE] Minor code cleanups against fallback tagging @zhztheplayer github.com//pull/6320
- [VL] Try to find arrow libs from velox bundled path firstly @PHILO-HE github.com//pull/6413
- [VL] disable tpch benchmarks on comment/merge @zhouyuan github.com//pull/6402
- [UT] Add a tool to validate any unary expression with all its accepted types @PHILO-HE github.com//pull/6392
- [CH] Fix a source file name typo @zhztheplayer github.com//pull/6412
- [VL] Fix Pi function fallback issue some condition @gaoyangxiaozhu github.com//pull/6408
- [CELEBORN] VeloxCelebornColumnarBatchSerializer uses the key and default value of SHUFFLE_COMPRESS to check whether to compress shuffle output @SteNicholas github.com//pull/6414
- [VL] Quick fix for commit conflicts @zhztheplayer github.com//pull/6418
- [Doc] Update new supported spark functions @gaoyangxiaozhu github.com//pull/6423
- [VL] Add a test to validate substring_index @boneanxs github.com//pull/6393
- [VL] Fix shuffle spill triggered by evicting buffers during stop @marin-ma github.com//pull/6422
- [VL] Enable repeat function @zhli1142015 github.com//pull/6419
- [VL] Accelerate Arrow compile @jinchengchenghh github.com//pull/6426
- [CI][VL] Update docker image for CI @zhouyuan github.com//pull/6401
- [VL] Daily Update Velox Version (2024_07_12) @GlutenPerfBot github.com//pull/6417
- [VL] Daily Update Velox Version (2024_07_13) @GlutenPerfBot github.com//pull/6436
- [VL] Daily Update Velox Version (2024_07_14) @GlutenPerfBot github.com//pull/6441
- [VL] Set Arrow_SOURCE to AUTO to allow using system arrow libs @PHILO-HE github.com//pull/6325
- [CELEBORN] CHCelebornColumnarShuffleWriter supports celeborn.client.spark.shuffle.writer to use memory sort shuffle ClickHouse backend @SteNicholas github.com//pull/6432
- [VL] Make sure the same thrift lib bundled arrow build is used for building Velox @zhztheplayer github.com//pull/6431
- [CORE] Make SparkSession transient HiveTableScanExecTransformer @yikf github.com//pull/6410
- [6176][CH] Add tpcds suite from decimal table schema @loneylee github.com//pull/6369
- [VL] Move dependencies setup ahead @PHILO-HE github.com//pull/6444
- [CH][CELEBORN] CHCelebornColumnarShuffleWriter supports celeborn.client.spark.shuffle.writer to use memory sort shuffle ClickHouse backend @SteNicholas github.com//pull/6454
- [VL] Enable right and anti join smj @JkSelf github.com//pull/6449
- [CH][CELEBORN] CHCelebornColumnarBatchSerializer uses AtomicBoolean to identify whether to call close() to avoid calling close() twice situation @SteNicholas github.com//pull/6455
- [CI][VL] Re-enable a build job running on clean dockers weekly @PHILO-HE github.com//pull/6424
- [CORE] Update LICENSE, NOTICE, LICENSE-binary, NOTICE-binary @weiting-chen github.com//pull/6443
- [CORE] Change DISCLAIMER to DISCLAIMER-WIP @weiting-chen github.com//pull/6442
- [VL] RAS: Minor code cleanup for offloading project @zhztheplayer github.com//pull/6452
- [VL] Add a way to create static build with docker container and gluten-te @zhztheplayer github.com//pull/6457
- [6467][CH] Minor Fix Build @baibaichen github.com//pull/6468
- [VL] Minor improvements and fixes for gluten-it and gluten-te @zhztheplayer github.com//pull/6471
- [CORE] Fix fallback for spark sequence function with literal array data as input @gaoyangxiaozhu github.com//pull/6433
- [VL] Fix offload input_file_name assert error @zml1206 github.com//pull/6390
- [VL] update docker image for cache-native-lib job @yma11 github.com//pull/6466
- [BUILD] Fix unbound variable @zml1206 github.com//pull/6474
- [VL] Daily Update Velox Version (2024_07_16) @GlutenPerfBot github.com//pull/6460
- [6437][BUILD] Fix vcpkg setup-build-dependens.sh for centos @wecharyu github.com//pull/6438
- [6470][CH]Fix Task not serializable error when inserting mergetree data @zzcclp github.com//pull/6473
- [6425][CH] Support day time internval @lgbo-ustc github.com//pull/6456
- [VL] remove redundant code parquet datasource to avoid memory leakage PR6430 @liujp github.com//pull/6462
- [Core] Spark version function support @gaoyangxiaozhu github.com//pull/6469
- [VL] Daily Update Velox Version (2024_07_17) @GlutenPerfBot github.com//pull/6479
- [VL] Minor improvements on gluten-it / gluten-te toolchains @zhztheplayer github.com//pull/6476
- [CH] Support merge MergeTree files @liuneng1994 github.com//pull/6472
- [6463][CH]refactor the code of parsing join parameters @lgbo-ustc github.com//pull/6485
- [1632][CH]Daily Update Clickhouse Version (20240718) @kyligence-git github.com/apache/incubat...
v1.3.0-rc0
Release Notes - Gluten version 1.3.0-rc0
Highlights
- Spark 3.2.2/3.3.1/3.4.3(upgraded)/3.5.2(upgraded)
- 268+ spark functions including json
- Update OAP's Velox codebase to 2025/01/07
- Join: Sort Merge Join support
- Shuffle: Sort based Shuffle(Row)
- Query Plan: RAS Optimization
- Datalake: Hudi 0.15.0 support/Iceberg 1.5.0/Delta 3.2.0
- RSS: Celeborn 0.5.2/Uniffle 0.9.1
- File Format: CSV support via arrow
- JVM libhdfs with viewfs/kerberos support
- Partial Project(UDF) support
- Mix backend refactor
- Bucket write in partitioned Hive table
- CI/Nightly Package Tools Update
- Build & Compile Tools Update(recommend to use vcpkg with static build)
- Fix several result mismatch issues
- Fix OOM/Yarn Kill unstable issues
What's Changed
- [VL] Make velox writer queue size configurable @yikf github.com//pull/6341
- [VL] Remove useless ctx variable @gaoyangxiaozhu github.com//pull/6348
- [1632][CH]Daily Update Clickhouse Version (20240706) @kyligence-git github.com//pull/6359
- [VL] fix build bundle package @zhouyuan github.com//pull/6364
- [VL] Fix process_setup_alinux3 arrow CMakeLists.txt path @liujiayi771 github.com//pull/6363
- [VL] Daily Update Velox Version (2024_07_08) @GlutenPerfBot github.com//pull/6366
- [6262][CH]Json input format ignore key case @KevinyhZou github.com//pull/6263
- [6285][VL] Add debian10 vcpkg depends @wenwj0 github.com//pull/6286
- [CELEBORN] CelebornShuffleManager#stop should stop non-null _vanillaCelebornShuffleManager @SteNicholas github.com//pull/6371
- [VL] Update ubuntu docker to use cmake 3.28 @boneanxs github.com//pull/6373
- [6304][CH]Support array_join @KevinyhZou github.com//pull/6305
- [VL] Daily Update Velox Version (2024_07_09) @GlutenPerfBot github.com//pull/6376
- [6378][CH] Support delta count optimizer for the MergeTree format @zzcclp github.com//pull/6379
- [6345][CH] Deprecate SCALAR_FUNCTIONS SerializedPlanParser @lgbo-ustc github.com//pull/6347
- [TEST] Use project version rather than Gluten version Gluten it @ulysses-you github.com//pull/6385
- [6377][CH] Support window function
percent_rank@lgbo-ustc github.com//pull/6386 - [VL] Minor refactor for ValueStream node construction and usage @Yohahaha github.com//pull/6382
- [VL] Enable levenshtein function @zhli1142015 github.com//pull/6389
- [VL] Daily Update Velox Version (2024_07_10) @GlutenPerfBot github.com//pull/6384
- [1632][CH]Daily Update Clickhouse Version (20240710) @kyligence-git github.com//pull/6383
- Test input_file_name, input_file_block_start & input_file_block_length when scan falls back @gaoyangxiaozhu github.com//pull/6318
- [6394][VL] Fix the vcpkg package script @weixiuli github.com//pull/6395
- [6288][CH] Support BroadcastNestedLoopJoinExe[Part one] @loneylee github.com//pull/6290
- [CELEBORN] Rename CelebornHashBasedColumnarShuffleWriter to CelebornColumnarShuffleWriter @kerwin-zk github.com//pull/6391
- [VL] Fix E function fallback issue some condition @gaoyangxiaozhu github.com//pull/6397
- [CI] Fix centos7 failure @marin-ma github.com//pull/6404
- [1632][CH]Daily Update Clickhouse Version (20240711) @kyligence-git github.com//pull/6399
- [CELEBORN] Add compression for row-based shuffle @kerwin-zk github.com//pull/6380
- [VL] Daily Update Velox Version (2024_07_11) @GlutenPerfBot github.com//pull/6400
- [CORE] Remove local sort for TopNRowNumber @ulysses-you github.com//pull/6381
- [VL] Spark assert_true function support @gaoyangxiaozhu github.com//pull/6329
- [VL] Add schema validation for all operators @zhli1142015 github.com//pull/6406
- [CORE] Minor code cleanups against fallback tagging @zhztheplayer github.com//pull/6320
- [VL] Try to find arrow libs from velox bundled path firstly @PHILO-HE github.com//pull/6413
- [VL] disable tpch benchmarks on comment/merge @zhouyuan github.com//pull/6402
- [UT] Add a tool to validate any unary expression with all its accepted types @PHILO-HE github.com//pull/6392
- [CH] Fix a source file name typo @zhztheplayer github.com//pull/6412
- [VL] Fix Pi function fallback issue some condition @gaoyangxiaozhu github.com//pull/6408
- [CELEBORN] VeloxCelebornColumnarBatchSerializer uses the key and default value of SHUFFLE_COMPRESS to check whether to compress shuffle output @SteNicholas github.com//pull/6414
- [VL] Quick fix for commit conflicts @zhztheplayer github.com//pull/6418
- [Doc] Update new supported spark functions @gaoyangxiaozhu github.com//pull/6423
- [VL] Add a test to validate substring_index @boneanxs github.com//pull/6393
- [VL] Fix shuffle spill triggered by evicting buffers during stop @marin-ma github.com//pull/6422
- [VL] Enable repeat function @zhli1142015 github.com//pull/6419
- [VL] Accelerate Arrow compile @jinchengchenghh github.com//pull/6426
- [CI][VL] Update docker image for CI @zhouyuan github.com//pull/6401
- [VL] Daily Update Velox Version (2024_07_12) @GlutenPerfBot github.com//pull/6417
- [VL] Daily Update Velox Version (2024_07_13) @GlutenPerfBot github.com//pull/6436
- [VL] Daily Update Velox Version (2024_07_14) @GlutenPerfBot github.com//pull/6441
- [VL] Set Arrow_SOURCE to AUTO to allow using system arrow libs @PHILO-HE github.com//pull/6325
- [CELEBORN] CHCelebornColumnarShuffleWriter supports celeborn.client.spark.shuffle.writer to use memory sort shuffle ClickHouse backend @SteNicholas github.com//pull/6432
- [VL] Make sure the same thrift lib bundled arrow build is used for building Velox @zhztheplayer github.com//pull/6431
- [CORE] Make SparkSession transient HiveTableScanExecTransformer @yikf github.com//pull/6410
- [6176][CH] Add tpcds suite from decimal table schema @loneylee github.com//pull/6369
- [VL] Move dependencies setup ahead @PHILO-HE github.com//pull/6444
- [CH][CELEBORN] CHCelebornColumnarShuffleWriter supports celeborn.client.spark.shuffle.writer to use memory sort shuffle ClickHouse backend @SteNicholas github.com//pull/6454
- [VL] Enable right and anti join smj @JkSelf github.com//pull/6449
- [CH][CELEBORN] CHCelebornColumnarBatchSerializer uses AtomicBoolean to identify whether to call close() to avoid calling close() twice situation @SteNicholas github.com//pull/6455
- [CI][VL] Re-enable a build job running on clean dockers weekly @PHILO-HE github.com//pull/6424
- [CORE] Update LICENSE, NOTICE, LICENSE-binary, NOTICE-binary @weiting-chen github.com//pull/6443
- [CORE] Change DISCLAIMER to DISCLAIMER-WIP @weiting-chen github.com//pull/6442
- [VL] RAS: Minor code cleanup for offloading project @zhztheplayer github.com//pull/6452
- [VL] Add a way to create static build with docker container and gluten-te @zhztheplayer github.com//pull/6457
- [6467][CH] Minor Fix Build @baibaichen github.com//pull/6468
- [VL] Minor improvements and fixes for gluten-it and gluten-te @zhztheplayer github.com//pull/6471
- [CORE] Fix fallback for spark sequence function with literal array data as input @gaoyangxiaozhu github.com//pull/6433
- [VL] Fix offload input_file_name assert error @zml1206 github.com//pull/6390
- [VL] update docker image for cache-native-lib job @yma11 github.com//pull/6466
- [BUILD] Fix unbound variable @zml1206 github.com//pull/6474
- [VL] Daily Update Velox Version (2024_07_16) @GlutenPerfBot github.com//pull/6460
- [6437][BUILD] Fix vcpkg setup-build-dependens.sh for centos @wecharyu github.com//pull/6438
- [6470][CH]Fix Task not serializable error when inserting mergetree data @zzcclp github.com//pull/6473
- [6425][CH] Support day time internval @lgbo-ustc github.com//pull/6456
- [VL] remove redundant code parquet datasource to avoid memory leakage PR6430 @liujp github.com//pull/6462
- [Core] Spark version function support @gaoyangxiaozhu github.com//pull/6469
- [VL] Daily Update Velox Version (2024_07_17) @GlutenPerfBot github.com//pull/6479
- [VL] Minor improvements on gluten-it / gluten-te toolchains @zhztheplayer github.com//pull/6476
- [CH] Support merge MergeTree files @liuneng1994 github.com//pull/6472
- [6463][CH]refactor the code of parsing join parameters @lgbo-ustc github.com//pull/6485
- [1632][CH]Daily Update Clickhouse Version (20240718) @kyligence-git github.com/apache/inc...