Skip to content

Commit 8d9e1f9

Browse files
authored
docs (#1735)
1 parent 108f837 commit 8d9e1f9

File tree

1 file changed

+15
-10
lines changed

1 file changed

+15
-10
lines changed

docs/source/contributor-guide/spark-sql-tests.md

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@ under the License.
1919

2020
# Running Spark SQL Tests
2121

22-
Running Apache Spark's SQL tests with Comet enabled is a good way to ensure that Comet produces the same
23-
results as that version of Spark. To enable this, we apply some changes to the Apache Spark source code so that
22+
Running Apache Spark's SQL tests with Comet enabled is a good way to ensure that Comet produces the same
23+
results as that version of Spark. To enable this, we apply some changes to the Apache Spark source code so that
2424
Comet is enabled when we run the tests.
2525

2626
Here is an overview of the changes that we need to make to Spark:
@@ -45,6 +45,7 @@ PROFILES="-Pspark-3.4" make release
4545
Clone Apache Spark locally and apply the diff file from Comet.
4646

4747
Note: this is a shallow clone of a tagged Spark commit and is not suitable for general Spark development.
48+
4849
```shell
4950
git clone -b 'v3.4.3' --single-branch --depth 1 [email protected]:apache/spark.git apache-spark
5051
cd apache-spark
@@ -67,11 +68,11 @@ ENABLE_COMET=true build/sbt "hive/testOnly * -- -n org.apache.spark.tags.SlowHiv
6768

6869
## Creating a diff file for a new Spark version
6970

70-
Once Comet has support for a new Spark version, we need to create a diff file that can be applied to that version
71-
of Apache Spark to enable Comet when running tests. This is a highly manual process and the process can
71+
Once Comet has support for a new Spark version, we need to create a diff file that can be applied to that version
72+
of Apache Spark to enable Comet when running tests. This is a highly manual process and the process can
7273
vary depending on the changes in the new version of Spark, but here is a general guide to the process.
7374

74-
We typically start by applying a patch from a previous version of Spark. For example, when enabling the tests
75+
We typically start by applying a patch from a previous version of Spark. For example, when enabling the tests
7576
for Spark version 3.5.5 we may start by applying the existing diff for 3.4.3 first.
7677

7778
```shell
@@ -80,7 +81,7 @@ git checkout v3.5.5
8081
git apply --reject --whitespace=fix ../datafusion-comet/dev/diffs/3.4.3.diff
8182
```
8283

83-
Any changes that cannot be cleanly applied will instead be written out to reject files. For example, the above
84+
Any changes that cannot be cleanly applied will instead be written out to reject files. For example, the above
8485
command generated the following files.
8586

8687
```shell
@@ -117,12 +118,16 @@ wiggle --replace ./sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.sc
117118

118119
## Generating The Diff File
119120

121+
The diff file can be generated using the `git diff` command. It may be necessary to set the `core.abbrev`
122+
configuration setting to use 11 digits hashes for consistency with existing diff files.
123+
120124
```shell
125+
git config core.abbrev 11;
121126
git diff v3.5.5 > ../datafusion-comet/dev/diffs/3.5.5.diff
122127
```
123128

124-
## Running Tests in CI
129+
## Running Tests in CI
125130

126-
The easiest way to run the tests is to create a PR against Comet and let CI run the tests. When working with a
127-
new Spark version, the `spark_sql_test.yaml` and `spark_sql_test_ansi.yaml` files will need updating with the
128-
new version.
131+
The easiest way to run the tests is to create a PR against Comet and let CI run the tests. When working with a
132+
new Spark version, the `spark_sql_test.yaml` and `spark_sql_test_ansi.yaml` files will need updating with the
133+
new version.

0 commit comments

Comments
 (0)