-
Notifications
You must be signed in to change notification settings - Fork 559
[GLUTEN-11343][CORE][VL] Support Spark 4.1 UT #11353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Run Gluten Clickhouse CI on x86 |
c381112 to
188cf15
Compare
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
42317d4 to
d0b2f8f
Compare
|
Run Gluten Clickhouse CI on x86 |
d0b2f8f to
87f9a2b
Compare
|
Run Gluten Clickhouse CI on x86 |
87f9a2b to
c286b0b
Compare
|
Run Gluten Clickhouse CI on x86 |
c286b0b to
60037df
Compare
|
Run Gluten Clickhouse CI on x86 |
zhouyuan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
60037df to
991a1de
Compare
|
Run Gluten Clickhouse CI on x86 |
gluten-ut/spark40/pom.xml
Outdated
| <activation> | ||
| <activeByDefault>false</activeByDefault> | ||
| </activation> | ||
| <properties> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to remove these properties in a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for Spark 4.1 unit tests by updating the build configuration, resolving compatibility issues, and adding new test resources. The changes accommodate API changes introduced in Spark 4.1, including dependency updates, package refactorings, and configuration parameter modifications.
Key Changes
- Updated build and dependency configurations to support Spark 4.1 testing
- Fixed compatibility issues from Spark API changes (streaming package refactoring, TypedConfigBuilder, V2 bucketing defaults)
- Added comprehensive SQL test input files for Spark 4.1 compatibility validation
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
991a1de to
83be8d5
Compare
|
Run Gluten Clickhouse CI on x86 |
## Changes | Cause | Type | Category | Description | Affected Files | |-------|------|----------|-------------|----------------| | N/A | Feat | Build | Update build configuration to support Spark 4.1 UT | `.github/workflows/velox_backend_x86.yml`, `gluten-ut/pom.xml`, `gluten-ut/spark41/pom.xml`, `tools/gluten-it/pom.xml` | | [#52165](apache/spark#52165) | Fix | Dependency | Update Parquet dependency version to 1.16.0 to avoid NoSuchMethodError issue | `gluten-ut/spark41/pom.xml` | | [#51477](apache/spark#51477) | Fix | Compatibility | Update imports to reflect streaming runtime package refactoring in Apache Spark | `gluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala`, `gluten-ut/spark41/.../GlutenStreamingQuerySuite.scala` | | [#50674](apache/spark#50674) | Fix | Compatibility | Fix compatibility issue introduced by `TypedConfigBuilder` | `gluten-substrait/.../ExpressionConverter.scala`, `gluten-ut/spark41/.../GlutenCSVSuite.scala`, `gluten-ut/spark41/.../GlutenJsonSuite.scala` | | [#49766](apache/spark#49766) | Fix | Compatibility | Disable V2 bucketing in GlutenDynamicPartitionPruningSuite since spark.sql.sources.v2.bucketing.enabled is now enabled by default | `gluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala` | | [#42414](apache/spark#42414), [#53038](apache/spark#53038) | Fix | Bug Fix | Resolve an issue introduced by SPARK-42414, as identified in SPARK-53038 | `backends-velox/.../VeloxBloomFilterAggregate.scala` | | N/A | Fix | Bug Fix | Enforce row fallback for unsupported cached batches - keep columnar execution only when schema validation succeeds | `backends-velox/.../ColumnarCachedBatchSerializer.scala` | | [SPARK-53132](apache/spark#53132), [SPARK-53142](apache/spark#53142) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 KeyGroupedPartitioningSuite tests. Excluded tests: `SPARK-53322*`, `SPARK-54439*` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [SPARK-53535](https://issues.apache.org/jira/browse/SPARK-53535), [SPARK-54220](https://issues.apache.org/jira/browse/SPARK-54220) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenParquetIOSuite tests. Excluded tests: `SPARK-53535*`, `vectorized reader: missing all struct fields*`, `SPARK-54220*` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#52645](apache/spark#52645) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenStreamingQuerySuite tests. Excluded tests: `SPARK-53942: changing the number of stateless shuffle partitions via config`, `SPARK-53942: stateful shuffle partitions are retained from old checkpoint` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#47856](apache/spark#47856) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenDataFrameWindowFunctionsSuite and GlutenJoinSuite tests. Excluded tests: `SPARK-49386: Window spill with more than the inMemoryThreshold and spillSizeThreshold`, `SPARK-49386: test SortMergeJoin (with spill by size threshold)` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#52157](apache/spark#52157) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenQueryExecutionSuite tests. Excluded test: `#53413: Cleanup shuffle dependencies for commands` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#48470](apache/spark#48470) | 4.1.0 | Test Exclusion | Exclude split test in GlutenRegexpExpressionsSuite. Excluded test: `GlutenRegexpExpressionsSuite.SPLIT` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#51623](apache/spark#51623) | 4.1.0 | Test Exclusion | Add `spark.sql.unionOutputPartitioning=false` to Maven test args. Excluded tests: `GlutenBroadcastExchangeSuite.SPARK-52962`, `GlutenDataFrameSetOperationsSuite.SPARK-52921*` | `.github/workflows/velox_backend_x86.yml`, `gluten-ut/spark41/.../VeloxTestSettings.scala`, `tools/gluten-it/common/.../Suite.scala` | | N/A | 4.1.0 | Test Exclusion | Excludes failed SQL tests that need to be fixed for Spark 4.1 compatibility. Excluded tests: `decimalArithmeticOperations.sql`, `identifier-clause.sql`, `keywords.sql`, `literals.sql`, `operators.sql`, `exists-orderby-limit.sql`, `postgreSQL/date.sql`, `nonansi/keywords.sql`, `nonansi/literals.sql`, `datetime-legacy.sql`, `datetime-parsing-invalid.sql`, `misc-functions.sql` | `gluten-ut/spark41/.../VeloxSQLQueryTestSettings.scala` | | apache#11252 | 4.1.0 | Test Exclusion | Exclude Gluten test for SPARK-47939: Explain should work with parameterized queries | `gluten-ut/spark41/.../VeloxTestSettings.scala` |
83be8d5 to
20952d1
Compare
|
Run Gluten Clickhouse CI on x86 |
What changes are proposed in this pull request?
.github/workflows/velox_backend_x86.yml,gluten-ut/pom.xml,gluten-ut/spark41/pom.xml,tools/gluten-it/pom.xmlgluten-ut/spark41/pom.xmlgluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala,gluten-ut/spark41/.../GlutenStreamingQuerySuite.scalaTypedConfigBuildergluten-substrait/.../ExpressionConverter.scala,gluten-ut/spark41/.../GlutenCSVSuite.scala,gluten-ut/spark41/.../GlutenJsonSuite.scalagluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scalabackends-velox/.../VeloxBloomFilterAggregate.scalabackends-velox/.../ColumnarCachedBatchSerializer.scalaSPARK-53322*,SPARK-54439*gluten-ut/spark41/.../VeloxTestSettings.scalaSPARK-53535*,vectorized reader: missing all struct fields*,SPARK-54220*gluten-ut/spark41/.../VeloxTestSettings.scalaSPARK-53942: changing the number of stateless shuffle partitions via config,SPARK-53942: stateful shuffle partitions are retained from old checkpointgluten-ut/spark41/.../VeloxTestSettings.scalaSPARK-49386: Window spill with more than the inMemoryThreshold and spillSizeThreshold,SPARK-49386: test SortMergeJoin (with spill by size threshold)gluten-ut/spark41/.../VeloxTestSettings.scala#53413: Cleanup shuffle dependencies for commandsgluten-ut/spark41/.../VeloxTestSettings.scalaGlutenRegexpExpressionsSuite.SPLITgluten-ut/spark41/.../VeloxTestSettings.scalaspark.sql.unionOutputPartitioning=falseto Maven test args. Excluded tests:GlutenBroadcastExchangeSuite.SPARK-52962,GlutenDataFrameSetOperationsSuite.SPARK-52921*.github/workflows/velox_backend_x86.yml,gluten-ut/spark41/.../VeloxTestSettings.scala,tools/gluten-it/common/.../Suite.scaladecimalArithmeticOperations.sql,identifier-clause.sql,keywords.sql,literals.sql,operators.sql,exists-orderby-limit.sql,postgreSQL/date.sql,nonansi/keywords.sql,nonansi/literals.sql,datetime-legacy.sql,datetime-parsing-invalid.sql,misc-functions.sqlgluten-ut/spark41/.../VeloxSQLQueryTestSettings.scalagluten-ut/spark41/.../VeloxTestSettings.scalaFixes #11343
How was this patch tested?
Tested with Spark 4.1 unit tests.