Skip to content

Conversation

@QCLyu
Copy link

@QCLyu QCLyu commented Jan 4, 2026

What changes are proposed in this pull request?

This PR comprehensively removes Spark 3.2 support from the Gluten Velox backend. It cleans up the source code, build profiles, CI/CD pipelines, and documentation.

Key changes include:

  • Source Code: Removed shims/spark32 and gluten-ut/spark32 directories.

  • Build System: Deleted the spark-3.2 profile from the root and all sub-module pom.xml files.

  • CI/CD: Removed legacy Spark 3.2 jobs (spark-test-spark32, spark-test-spark32-slow, and TPC-H OOM tests) from GitHub Workflows to reduce CI overhead.

  • Test Migration: Refactored VeloxHashJoinSuite and other backend tests to remove Spark 3.2-specific conditional logic, ensuring these tests now run on Spark 3.3+.

  • Documentation: Updated the build guide and ClickHouse deployment docs to remove references to Spark 3.2.

How was this patch tested?

  • Manual Build: Verified successful compilation on aarch64 (ARM64) using -Pspark-3.5 -Pbackends-velox.

  • Unit Tests: Verified that migrated tests in VeloxHashJoinSuite pass successfully under Spark 3.5.

  • CI: Infrastructure changes have been validated to ensure remaining Spark versions (3.3, 3.4, 3.5) trigger correctly.

Closes #8960

@github-actions
Copy link

github-actions bot commented Jan 4, 2026

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link

github-actions bot commented Jan 4, 2026

Run Gluten Clickhouse CI on x86

@QCLyu
Copy link
Author

QCLyu commented Jan 4, 2026

I have verified the changes for Spark 3.5 locally, while GitHub Actions was showing failures:

Jenkins (ClickHouse CI): SUCCESS.

The Reactor Summary confirms all modules (including gluten-core, shims, and backends-clickhouse) built successfully and all 36 tests passed.

Click to view Jenkins Reactor Summary (Build Success)

12:33:20 Run completed in 2 minutes, 25 seconds.
12:33:20 Total number of tests run: 36
12:33:20 Suites: completed 2, aborted 0
12:33:20 Tests: succeeded 36, failed 0, canceled 0, ignored 12, pending 0
12:33:20 All tests passed.
12:33:21 [INFO] ------------------------------------------------------------------------
12:33:21 [INFO] Reactor Summary for Gluten Parent Pom 1.6.0-SNAPSHOT:
12:33:21 [INFO]
12:33:21 [INFO] Gluten Parent Pom .................................. SUCCESS [ 17.421 s]
12:33:21 [INFO] Gluten Ras ......................................... SUCCESS [ 23.776 s]
12:33:21 [INFO] Gluten Ras Common .................................. SUCCESS [ 51.164 s]
12:33:21 [INFO] Gluten Core ........................................ SUCCESS [ 37.484 s]
12:33:21 [INFO] Gluten Shims ....................................... SUCCESS [ 0.260 s]
12:33:21 [INFO] Gluten Shims Common ................................ SUCCESS [ 6.371 s]
12:33:21 [INFO] Gluten Shims for Spark 3.3 ......................... SUCCESS [ 13.796 s]
12:33:21 [INFO] Gluten UI .......................................... SUCCESS [ 4.449 s]
12:33:21 [INFO] Gluten Substrait ................................... SUCCESS [ 59.459 s]
12:33:21 [INFO] Gluten Celeborn .................................... SUCCESS [ 4.854 s]
12:33:21 [INFO] Gluten Iceberg ..................................... SUCCESS [ 11.367 s]
12:33:21 [INFO] Gluten DeltaLake ................................... SUCCESS [ 9.760 s]
12:33:21 [INFO] Gluten Package ..................................... SUCCESS [ 6.663 s]
12:33:21 [INFO] Gluten Ras Planner ................................. SUCCESS [ 1.137 s]
12:33:21 [INFO] Gluten Kafka ....................................... SUCCESS [ 9.403 s]
12:33:21 [INFO] Gluten Backends ClickHouse ......................... SUCCESS [59:48 min]
12:33:21 [INFO] Gluten Unit Test Parent ............................ SUCCESS [ 1.323 s]
12:33:21 [INFO] Gluten Unit Test Common ............................ SUCCESS [ 5.468 s]
12:33:21 [INFO] Gluten Unit Test ................................... SUCCESS [ 16.964 s]
12:33:21 [INFO] Gluten Unit Test Spark33 ........................... SUCCESS [02:45 min]
12:33:21 [INFO] ------------------------------------------------------------------------
12:33:21 [INFO] BUILD SUCCESS
12:33:21 [INFO] ------------------------------------------------------------------------
12:33:21 [INFO] Total time: 01:07 h
12:33:21 [INFO] Finished at: 2026-01-04T04:33:21Z
12:33:21 [INFO] ------------------------------------------------------------------------

GitHub Actions:

These jobs are failing with 403 Forbidden errors during dependency resolution (Log4j, ASM, etc.). In the following commit, these issues are found and fixed:

  • .github/workflows/velox_nightly.yml (2 occurrences)
    Removed mvn clean install -Pspark-3.2 from both the x86 and arm64 build jobs
    Lines 103 and 226
  • .github/workflows/build_bundle_package.yml
    Updated description from 'Spark version: spark-3.2, spark-3.3, spark-3.4 or spark-3.5' to 'Spark version: spark-3.3, spark-3.4, spark-3.5 or spark-4.0'
  • .github/workflows/util/install-spark-resources.sh
    Removed the Spark 3.2 case (lines 92-96) from the script

These changes aim to resolve the 403 errors. The workflows will no longer attempt to build with the removed Spark 3.2 profile.

@github-actions
Copy link

github-actions bot commented Jan 4, 2026

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

github-actions bot commented Jan 5, 2026

Run Gluten Clickhouse CI on x86

2 similar comments
@github-actions
Copy link

github-actions bot commented Jan 6, 2026

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

github-actions bot commented Jan 6, 2026

Run Gluten Clickhouse CI on x86

@QCLyu
Copy link
Author

QCLyu commented Jan 6, 2026

Hi @zhouyuan , the code changes here look good. The only ClickHouse CI failure is due to the Jenkins job still running -Pspark-3.2 on Java 8, but this repo no longer has a spark-3.2 profile and pulls Iceberg artifacts built for Java 11 (class version 55). That mismatch causes the compile error in gluten-iceberg. This is a CI config issue, not a code regression.

Please proceed with merge if everything else looks good to you, and we can update/disable the Spark 3.2 Java 8 leg in the Jenkins pipeline separately.

This is the only failed step in ClickHouse CI: https://opencicd.kyligence.com/job/gluten/job/gluten-ci/18337/flowGraphTable/
This is the log: https://opencicd.kyligence.com/job/gluten/job/gluten-ci/18337/execution/node/235/log/

18:00:32  [ERROR] COMPILATION ERROR : 
18:00:32  [INFO] -------------------------------------------------------------
18:00:32  [ERROR] /home/jenkins/agent/workspace/gluten/gluten-ci/ut-stage-1/gluten-iceberg/src/main/java/org/apache/gluten/connector/write/MetricsWrapper.java:[21,25] error: cannot access Metrics
18:00:32    bad class file: /root/.m2/repository/org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/1.10.0/iceberg-spark-runtime-3.5_2.12-1.10.0.jar(org/apache/iceberg/Metrics.class)
18:00:32      class file has wrong version 55.0, should be 52.0
18:00:32      Please remove or make sure it appears in the correct subdirectory of the classpath.
18:00:32  [INFO] 1 error
18:00:32  [INFO] -------------------------------------------------------------
18:00:32  [INFO] ------------------------------------------------------------------------
18:00:32  [INFO] Reactor Summary for Gluten Parent Pom 1.6.0-SNAPSHOT:
18:00:32  [INFO] 
18:00:32  [INFO] Gluten Parent Pom .................................. SUCCESS [ 21.426 s]
18:00:32  [INFO] Gluten Ras ......................................... SUCCESS [ 22.345 s]
18:00:32  [INFO] Gluten Ras Common .................................. SUCCESS [ 51.109 s]
18:00:32  [INFO] Gluten Core ........................................ SUCCESS [ 39.051 s]
18:00:32  [INFO] Gluten Shims ....................................... SUCCESS [  0.336 s]
18:00:32  [INFO] Gluten Shims Common ................................ SUCCESS [  6.173 s]
18:00:32  [INFO] Gluten Shims for Spark 3.5 ......................... SUCCESS [ 12.425 s]
18:00:32  [INFO] Gluten UI .......................................... SUCCESS [  5.105 s]
18:00:32  [INFO] Gluten Substrait ................................... SUCCESS [01:01 min]
18:00:32  [INFO] Gluten Celeborn .................................... SUCCESS [  4.345 s]
18:00:32  [INFO] Gluten Iceberg ..................................... FAILURE [  5.848 s]
18:00:32  [INFO] Gluten DeltaLake ................................... SKIPPED
18:00:32  [INFO] Gluten Package ..................................... SKIPPED
18:00:32  [INFO] Gluten Ras Planner ................................. SKIPPED
18:00:32  [INFO] Gluten Kafka ....................................... SKIPPED
18:00:32  [INFO] Gluten Backends ClickHouse ......................... SKIPPED
18:00:32  [INFO] Gluten Unit Test Parent ............................ SKIPPED
18:00:32  [INFO] Gluten Unit Test Common ............................ SKIPPED
18:00:32  [INFO] Gluten Unit Test ................................... SKIPPED
18:00:32  [INFO] ------------------------------------------------------------------------
18:00:32  [INFO] BUILD FAILURE
18:00:32  [INFO] ------------------------------------------------------------------------
18:00:32  [INFO] Total time:  03:49 min
18:00:32  [INFO] Finished at: 2026-01-06T02:00:32Z
18:00:32  [INFO] ------------------------------------------------------------------------
18:00:32  [WARNING] The requested profile "spark-3.2" could not be activated because it does not exist.
18:00:32  [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.14.1:compile (default-compile) on project gluten-iceberg: Compilation failure
18:00:32  [ERROR] /home/jenkins/agent/workspace/gluten/gluten-ci/ut-stage-1/gluten-iceberg/src/main/java/org/apache/gluten/connector/write/MetricsWrapper.java:[21,25] error: cannot access Metrics
18:00:32  [ERROR]   bad class file: /root/.m2/repository/org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/1.10.0/iceberg-spark-runtime-3.5_2.12-1.10.0.jar(org/apache/iceberg/Metrics.class)
18:00:32  [ERROR]     class file has wrong version 55.0, should be 52.0
18:00:32  [ERROR]     Please remove or make sure it appears in the correct subdirectory of the classpath.
18:00:32  [ERROR] 
18:00:32  [ERROR] -> [Help 1]

@zhouyuan
Copy link
Contributor

zhouyuan commented Jan 6, 2026

@zzcclp could you please help to take a look?

@QCLyu
Copy link
Author

QCLyu commented Jan 7, 2026

Hi @zhouyuan @zzcclp Just following up on my last comment here. I’d love to get your thoughts on CI config issue—do you feel this makes sense, or maybe not? Want to make sure we’re aligned before I move forward.

@zzcclp
Copy link
Contributor

zzcclp commented Jan 7, 2026

Sorry for the late reply, I modified the CI script , please hava a try again.

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

Run Gluten Clickhouse CI on x86

QCLyu and others added 10 commits January 7, 2026 19:41
* [Scala 2.13][IntelliJ] Remove suppression for lint-multiarg-infix warnings in pom.xml

see apache/spark#43332

* [Scala 2.13][IntelliJ] Suppress warning for `ContentFile::path`

* [Scala 2.13][IntelliJ] Suppress warning for ContextAwareIterator initialization

* [Scala 2.13][IntelliJ] Refactor to use Symbol for column references to fix compilation error in Scala 2.13 with IntelliJ compiler: symbol literal is deprecated; use Symbol("i")

* [Fix] Replace deprecated fileToString with Files.readString for file reading in GlutenSQLQueryTestSuite

see apache/spark#51911 which removes Spark's fileToString method from Spark code base.

* [Scala 2.13][IntelliJ] Update the Java compiler release version from 8 to `${java.version}` in the Scala 2.13 profiler to align it with `maven.compiler.target`

* [Refactor] Replace usage of `Symbol` with `col` for column references to align with Spark API best practices

---------

Co-authored-by: Chang chen <[email protected]>
…/ut (apache#11317)

Bumps org.apache.kafka:kafka_2.12 from 3.4.0 to 3.9.1.

---
updated-dependencies:
- dependency-name: org.apache.kafka:kafka_2.12
  dependency-version: 3.9.1
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [Refactor] Rename GlutenCastSuite to GlutenCastWithAnsiOffSuite and update test settings to use the new suite

* [Refactor] Add GlutenDataSourceV2SQLSuite classes for V1 and V2 filter testing

Remove  GlutenDataSourceV2SQLSuiteV1Filter.scala and GlutenDataSourceV2SQLSuiteV2Filter.scala

* [Refactor] Rename FallbackStrategiesSuite to GlutenFallbackStrategiesSuite and move to gluten package

* [Refactor] Consolidate GlutenDeleteFromTableSuite into GlutenGroupBasedDeleteFromTableSuite for cleaner structure

* [Refactor] Remove ParquetReadBenchmark as it is no longer necessary

* [Refactor] Adjust import structure and package declaration for GlutenValidateRequirementsSuite
…#11349)

Upstream Velox's New Commits:
2fdcd253e by Xiaoxuan Meng, misc: Added index bound type unit test (15879)
74af4ef1b by Xiao Du, feat: Add string compaction for approx_most_frequent global aggregation (15852)
48e853131 by Pedro Eugenio Rocha Pedreira, refactor(simple-function): Make materialization of string-types explicit (15869)
80638a89e by Artem Selishchev, fix: [velox] Reuse context in ZSTD_decompress (15854)

Signed-off-by: glutenperfbot <[email protected]>
Co-authored-by: glutenperfbot <[email protected]>
@github-actions
Copy link

github-actions bot commented Jan 8, 2026

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

Run Gluten Clickhouse CI on x86

Copy link
Member

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your work. Some comments. Please check if they make sense.

Suggest to use git pull --rebase <remote name> main to rebase code. Otherwise, some commits already merged to main branch are included in this PR, which are mixed with your changes and not friendly to reviewers.

Please also clean the use of Spark 3.2 in build scripts under dev.

Maybe, some Spark shim APIs were introduced to adapt to the differences between Spark 3.2 and later Spark versions. If so, we should also remove them (can be done in separate PRs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we should only remove Spark 3.2 UT test jobs from CI. Other jobs like celeborn test should change to using Spark 3.3 or higher supported versions, instead of deleting those tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PHILO-HE Agreed.
Also, created a linked issue for Spark 3.2-specific compatibility code removal: #11379

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm if these changes were intended.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PHILO-HE I'm working on it and will commit again.

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

Run Gluten Clickhouse CI on x86

@QCLyu QCLyu marked this pull request as draft January 8, 2026 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VL] deprecated Spark-3.2 unit tests

8 participants