-
Notifications
You must be signed in to change notification settings - Fork 559
[VL] Deprecate and remove Spark 3.2 support #11351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
|
I have verified the changes for Spark 3.5 locally, while GitHub Actions was showing failures: Jenkins (ClickHouse CI): SUCCESS.The Reactor Summary confirms all modules (including gluten-core, shims, and backends-clickhouse) built successfully and all 36 tests passed. Click to view Jenkins Reactor Summary (Build Success)12:33:20 Run completed in 2 minutes, 25 seconds. GitHub Actions:These jobs are failing with 403 Forbidden errors during dependency resolution (Log4j, ASM, etc.). In the following commit, these issues are found and fixed:
These changes aim to resolve the 403 errors. The workflows will no longer attempt to build with the removed Spark 3.2 profile. |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
2 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Hi @zhouyuan , the code changes here look good. The only ClickHouse CI failure is due to the Jenkins job still running -Pspark-3.2 on Java 8, but this repo no longer has a spark-3.2 profile and pulls Iceberg artifacts built for Java 11 (class version 55). That mismatch causes the compile error in gluten-iceberg. This is a CI config issue, not a code regression. Please proceed with merge if everything else looks good to you, and we can update/disable the Spark 3.2 Java 8 leg in the Jenkins pipeline separately. This is the only failed step in ClickHouse CI: https://opencicd.kyligence.com/job/gluten/job/gluten-ci/18337/flowGraphTable/ |
|
@zzcclp could you please help to take a look? |
|
Sorry for the late reply, I modified the CI script , please hava a try again. |
|
Run Gluten Clickhouse CI on x86 |
* [Scala 2.13][IntelliJ] Remove suppression for lint-multiarg-infix warnings in pom.xml see apache/spark#43332 * [Scala 2.13][IntelliJ] Suppress warning for `ContentFile::path` * [Scala 2.13][IntelliJ] Suppress warning for ContextAwareIterator initialization * [Scala 2.13][IntelliJ] Refactor to use Symbol for column references to fix compilation error in Scala 2.13 with IntelliJ compiler: symbol literal is deprecated; use Symbol("i") * [Fix] Replace deprecated fileToString with Files.readString for file reading in GlutenSQLQueryTestSuite see apache/spark#51911 which removes Spark's fileToString method from Spark code base. * [Scala 2.13][IntelliJ] Update the Java compiler release version from 8 to `${java.version}` in the Scala 2.13 profiler to align it with `maven.compiler.target` * [Refactor] Replace usage of `Symbol` with `col` for column references to align with Spark API best practices --------- Co-authored-by: Chang chen <[email protected]>
…/ut (apache#11317) Bumps org.apache.kafka:kafka_2.12 from 3.4.0 to 3.9.1. --- updated-dependencies: - dependency-name: org.apache.kafka:kafka_2.12 dependency-version: 3.9.1 dependency-type: direct:development ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [Refactor] Rename GlutenCastSuite to GlutenCastWithAnsiOffSuite and update test settings to use the new suite * [Refactor] Add GlutenDataSourceV2SQLSuite classes for V1 and V2 filter testing Remove GlutenDataSourceV2SQLSuiteV1Filter.scala and GlutenDataSourceV2SQLSuiteV2Filter.scala * [Refactor] Rename FallbackStrategiesSuite to GlutenFallbackStrategiesSuite and move to gluten package * [Refactor] Consolidate GlutenDeleteFromTableSuite into GlutenGroupBasedDeleteFromTableSuite for cleaner structure * [Refactor] Remove ParquetReadBenchmark as it is no longer necessary * [Refactor] Adjust import structure and package declaration for GlutenValidateRequirementsSuite
…l values (apache#11331) --------- Co-authored-by: jiangtian <[email protected]>
…#11349) Upstream Velox's New Commits: 2fdcd253e by Xiaoxuan Meng, misc: Added index bound type unit test (15879) 74af4ef1b by Xiao Du, feat: Add string compaction for approx_most_frequent global aggregation (15852) 48e853131 by Pedro Eugenio Rocha Pedreira, refactor(simple-function): Make materialization of string-types explicit (15869) 80638a89e by Artem Selishchev, fix: [velox] Reuse context in ZSTD_decompress (15854) Signed-off-by: glutenperfbot <[email protected]> Co-authored-by: glutenperfbot <[email protected]>
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
PHILO-HE
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for your work. Some comments. Please check if they make sense.
Suggest to use git pull --rebase <remote name> main to rebase code. Otherwise, some commits already merged to main branch are included in this PR, which are mixed with your changes and not friendly to reviewers.
Please also clean the use of Spark 3.2 in build scripts under dev.
Maybe, some Spark shim APIs were introduced to adapt to the differences between Spark 3.2 and later Spark versions. If so, we should also remove them (can be done in separate PRs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we should only remove Spark 3.2 UT test jobs from CI. Other jobs like celeborn test should change to using Spark 3.3 or higher supported versions, instead of deleting those tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please confirm if these changes were intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @PHILO-HE I'm working on it and will commit again.
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
What changes are proposed in this pull request?
This PR comprehensively removes Spark 3.2 support from the Gluten Velox backend. It cleans up the source code, build profiles, CI/CD pipelines, and documentation.
Key changes include:
Source Code: Removed shims/spark32 and gluten-ut/spark32 directories.
Build System: Deleted the spark-3.2 profile from the root and all sub-module pom.xml files.
CI/CD: Removed legacy Spark 3.2 jobs (spark-test-spark32, spark-test-spark32-slow, and TPC-H OOM tests) from GitHub Workflows to reduce CI overhead.
Test Migration: Refactored VeloxHashJoinSuite and other backend tests to remove Spark 3.2-specific conditional logic, ensuring these tests now run on Spark 3.3+.
Documentation: Updated the build guide and ClickHouse deployment docs to remove references to Spark 3.2.
How was this patch tested?
Manual Build: Verified successful compilation on aarch64 (ARM64) using -Pspark-3.5 -Pbackends-velox.
Unit Tests: Verified that migrated tests in VeloxHashJoinSuite pass successfully under Spark 3.5.
CI: Infrastructure changes have been validated to ensure remaining Spark versions (3.3, 3.4, 3.5) trigger correctly.
Closes #8960