Skip to content

Commit 646dc9b

Browse files
authored
upgraded spark 3.5.4 to 3.5.5 (#1565)
## Which issue does this PR close? Closes #1461 ## Rationale for this change Spark 3.5.5 is the latest stable 3.5.x version, and should be supported. ## What changes are included in this PR? Just the upgrade of spark 3.5.4 to 3.5.5 and the only code change required in `ShimCometScanExec`. ## How are these changes tested? Ran `mvn test` with the `spark-3.5` profile. This was sufficient since the build failed with just the upgrade and without the required code change.
1 parent badbd37 commit 646dc9b

File tree

8 files changed

+91
-91
lines changed

8 files changed

+91
-91
lines changed

.github/workflows/spark_sql_test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ jobs:
4545
matrix:
4646
os: [ubuntu-24.04]
4747
java-version: [11]
48-
spark-version: [{short: '3.4', full: '3.4.3'}, {short: '3.5', full: '3.5.4'}]
48+
spark-version: [{short: '3.4', full: '3.4.3'}, {short: '3.5', full: '3.5.5'}]
4949
module:
5050
- {name: "catalyst", args1: "catalyst/test", args2: ""}
5151
- {name: "sql/core-1", args1: "", args2: sql/testOnly * -- -l org.apache.spark.tags.ExtendedSQLTest -l org.apache.spark.tags.SlowSQLTest}

benchmarks/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
# See the License for the specific language governing permissions and
1414
# limitations under the License.
1515

16-
FROM apache/datafusion-comet:0.7.0-spark3.5.4-scala2.12-java11
16+
FROM apache/datafusion-comet:0.7.0-spark3.5.5-scala2.12-java11
1717

1818
RUN apt update \
1919
&& apt install -y git python3 python3-pip \
Lines changed: 75 additions & 75 deletions
Large diffs are not rendered by default.

docs/source/contributor-guide/spark-sql-tests.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -72,11 +72,11 @@ of Apache Spark to enable Comet when running tests. This is a highly manual proc
7272
vary depending on the changes in the new version of Spark, but here is a general guide to the process.
7373

7474
We typically start by applying a patch from a previous version of Spark. For example, when enabling the tests
75-
for Spark version 3.5.4 we may start by applying the existing diff for 3.4.3 first.
75+
for Spark version 3.5.5 we may start by applying the existing diff for 3.4.3 first.
7676

7777
```shell
7878
cd git/apache/spark
79-
git checkout v3.5.4
79+
git checkout v3.5.5
8080
git apply --reject --whitespace=fix ../datafusion-comet/dev/diffs/3.4.3.diff
8181
```
8282

@@ -118,7 +118,7 @@ wiggle --replace ./sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.sc
118118
## Generating The Diff File
119119

120120
```shell
121-
git diff v3.5.4 > ../datafusion-comet/dev/diffs/3.5.4.diff
121+
git diff v3.5.5 > ../datafusion-comet/dev/diffs/3.5.5.diff
122122
```
123123

124124
## Running Tests in CI

docs/source/user-guide/configs.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -71,9 +71,9 @@ Comet provides the following configuration settings.
7171
| spark.comet.explain.verbose.enabled | When this setting is enabled, Comet will provide a verbose tree representation of the extended information. | false |
7272
| spark.comet.explainFallback.enabled | When this setting is enabled, Comet will provide logging explaining the reason(s) why a query stage cannot be executed natively. Set this to false to reduce the amount of logging. | false |
7373
| spark.comet.expression.allowIncompatible | Comet is not currently fully compatible with Spark for all expressions. Set this config to true to allow them anyway. For more information, refer to the Comet Compatibility Guide (https://datafusion.apache.org/comet/user-guide/compatibility.html). | false |
74-
| spark.comet.memory.overhead.factor | Fraction of executor memory to be allocated as additional memory for Comet when running in on-heap mode. For more information, refer to the Comet Tuning Guide (https://datafusion.apache.org/comet/user-guide/tuning.html). | 0.2 |
75-
| spark.comet.memory.overhead.min | Minimum amount of additional memory to be allocated per executor process for Comet, in MiB, when running in on-heap mode. For more information, refer to the Comet Tuning Guide (https://datafusion.apache.org/comet/user-guide/tuning.html). | 402653184b |
76-
| spark.comet.memoryOverhead | The amount of additional memory to be allocated per executor process for Comet, in MiB, when running in on-heap mode. This config is optional. If this is not specified, it will be set to `spark.comet.memory.overhead.factor` * `spark.executor.memory`. For more information, refer to the Comet Tuning Guide (https://datafusion.apache.org/comet/user-guide/tuning.html). | |
74+
| spark.comet.memory.overhead.factor | Fraction of executor memory to be allocated as additional memory for Comet when running Spark in on-heap mode. For more information, refer to the Comet Tuning Guide (https://datafusion.apache.org/comet/user-guide/tuning.html). | 0.2 |
75+
| spark.comet.memory.overhead.min | Minimum amount of additional memory to be allocated per executor process for Comet, in MiB, when running Spark in on-heap mode. For more information, refer to the Comet Tuning Guide (https://datafusion.apache.org/comet/user-guide/tuning.html). | 402653184b |
76+
| spark.comet.memoryOverhead | The amount of additional memory to be allocated per executor process for Comet, in MiB, when running Spark in on-heap mode. This config is optional. If this is not specified, it will be set to `spark.comet.memory.overhead.factor` * `spark.executor.memory`. For more information, refer to the Comet Tuning Guide (https://datafusion.apache.org/comet/user-guide/tuning.html). | |
7777
| spark.comet.metrics.updateInterval | The interval in milliseconds to update metrics. If interval is negative, metrics will be updated upon task completion. | 3000 |
7878
| spark.comet.nativeLoadRequired | Whether to require Comet native library to load successfully when Comet is enabled. If not, Comet will silently fallback to Spark when it fails to load the native lib. Otherwise, an error will be thrown and the Spark job will be aborted. | false |
7979
| spark.comet.parquet.enable.directBuffer | Whether to use Java direct byte buffer when reading Parquet. | false |

docs/source/user-guide/kubernetes.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -66,10 +66,10 @@ metadata:
6666
spec:
6767
type: Scala
6868
mode: cluster
69-
image: apache/datafusion-comet:0.7.0-spark3.5.4-scala2.12-java11
69+
image: apache/datafusion-comet:0.7.0-spark3.5.5-scala2.12-java11
7070
imagePullPolicy: IfNotPresent
7171
mainClass: org.apache.spark.examples.SparkPi
72-
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.4.jar
72+
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.5.jar
7373
sparkConf:
7474
"spark.executor.extraClassPath": "/opt/spark/jars/comet-spark-spark3.5_2.12-0.7.0.jar"
7575
"spark.driver.extraClassPath": "/opt/spark/jars/comet-spark-spark3.5_2.12-0.7.0.jar"
@@ -80,17 +80,17 @@ spec:
8080
"spark.comet.exec.shuffle.enabled": "true"
8181
"spark.comet.exec.shuffle.mode": "auto"
8282
"spark.shuffle.manager": "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager"
83-
sparkVersion: 3.5.4
83+
sparkVersion: 3.5.5
8484
driver:
8585
labels:
86-
version: 3.5.4
86+
version: 3.5.5
8787
cores: 1
8888
coreLimit: 1200m
8989
memory: 512m
9090
serviceAccount: spark-operator-spark
9191
executor:
9292
labels:
93-
version: 3.5.4
93+
version: 3.5.5
9494
instances: 1
9595
cores: 1
9696
coreLimit: 1200m

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -556,7 +556,7 @@ under the License.
556556
<id>spark-3.5</id>
557557
<properties>
558558
<scala.version>2.12.18</scala.version>
559-
<spark.version>3.5.4</spark.version>
559+
<spark.version>3.5.5</spark.version>
560560
<spark.version.short>3.5</spark.version.short>
561561
<parquet.version>1.13.1</parquet.version>
562562
<slf4j.version>2.0.7</slf4j.version>

spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,15 +55,15 @@ trait ShimCometScanExec {
5555
protected def isNeededForSchema(sparkSchema: StructType): Boolean = false
5656

5757
protected def getPartitionedFile(f: FileStatusWithMetadata, p: PartitionDirectory): PartitionedFile =
58-
PartitionedFileUtil.getPartitionedFile(f, p.values)
58+
PartitionedFileUtil.getPartitionedFile(f, f.getPath, p.values)
5959

6060
protected def splitFiles(sparkSession: SparkSession,
6161
file: FileStatusWithMetadata,
6262
filePath: Path,
6363
isSplitable: Boolean,
6464
maxSplitBytes: Long,
6565
partitionValues: InternalRow): Seq[PartitionedFile] =
66-
PartitionedFileUtil.splitFiles(sparkSession, file, isSplitable, maxSplitBytes, partitionValues)
66+
PartitionedFileUtil.splitFiles(sparkSession, file, filePath, isSplitable, maxSplitBytes, partitionValues)
6767

6868
protected def getPushedDownFilters(relation: HadoopFsRelation , dataFilters: Seq[Expression]): Seq[Filter] = {
6969
val supportNestedPredicatePushdown = DataSourceUtils.supportNestedPredicatePushdown(relation)

0 commit comments

Comments
 (0)