Skip to content

Commit 7720781

Browse files
HyukjinKwondongjoon-hyun
authored andcommitted
[SPARK-29127][SQL][PYTHON] Add a clue for Python related version information in integrated UDF tests
### What changes were proposed in this pull request? This PR proposes to show Python, pandas and PyArrow versions in integrated UDF tests as a clue so when the test cases fail, it show the related version information. I think we don't really need this kind of version information in the test case name for now since I intend that integrated SQL test cases do not target to test different combinations of Python, Pandas and PyArrow. ### Why are the changes needed? To make debug easier. ### Does this PR introduce any user-facing change? It will change test name to include related Python, pandas and PyArrow versions. ### How was this patch tested? Manually tested: ``` [info] - udf/postgreSQL/udf-case.sql - Scala UDF *** FAILED *** (8 seconds, 229 milliseconds) [info] udf/postgreSQL/udf-case.sql - Scala UDF ... [info] - udf/postgreSQL/udf-case.sql - Regular Python UDF *** FAILED *** (6 seconds, 298 milliseconds) [info] udf/postgreSQL/udf-case.sql - Regular Python UDF [info] Python: 3.7 ... [info] - udf/postgreSQL/udf-case.sql - Scalar Pandas UDF *** FAILED *** (6 seconds, 376 milliseconds) [info] udf/postgreSQL/udf-case.sql - Scalar Pandas UDF [info] Python: 3.7 Pandas: 0.25.3 PyArrow: 0.14.0 ``` Closes apache#26538 from HyukjinKwon/investigate-flaky-test. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent c0507e0 commit 7720781

File tree

2 files changed

+34
-2
lines changed

2 files changed

+34
-2
lines changed

sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ object IntegratedUDFTestUtils extends SQLHelper {
122122
true
123123
}.getOrElse(false)
124124

125-
private lazy val pythonVer = if (isPythonAvailable) {
125+
lazy val pythonVer: String = if (isPythonAvailable) {
126126
Process(
127127
Seq(pythonExec, "-c", "import sys; print('%d.%d' % sys.version_info[:2])"),
128128
None,
@@ -131,6 +131,24 @@ object IntegratedUDFTestUtils extends SQLHelper {
131131
throw new RuntimeException(s"Python executable [$pythonExec] is unavailable.")
132132
}
133133

134+
lazy val pandasVer: String = if (isPandasAvailable) {
135+
Process(
136+
Seq(pythonExec, "-c", "import pandas; print(pandas.__version__)"),
137+
None,
138+
"PYTHONPATH" -> s"$pysparkPythonPath:$pythonPath").!!.trim()
139+
} else {
140+
throw new RuntimeException("Pandas is unavailable.")
141+
}
142+
143+
lazy val pyarrowVer: String = if (isPyArrowAvailable) {
144+
Process(
145+
Seq(pythonExec, "-c", "import pyarrow; print(pyarrow.__version__)"),
146+
None,
147+
"PYTHONPATH" -> s"$pysparkPythonPath:$pythonPath").!!.trim()
148+
} else {
149+
throw new RuntimeException("PyArrow is unavailable.")
150+
}
151+
134152
// Dynamically pickles and reads the Python instance into JVM side in order to mimic
135153
// Python native function within Python UDF.
136154
private lazy val pythonFunc: Array[Byte] = if (shouldTestPythonUDFs) {

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -384,7 +384,21 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
384384
// This is a temporary workaround for SPARK-28894. The test names are truncated after
385385
// the last dot due to a bug in SBT. This makes easier to debug via Jenkins test result
386386
// report. See SPARK-28894.
387-
withClue(s"${testCase.name}${System.lineSeparator()}") {
387+
// See also SPARK-29127. It is difficult to see the version information in the failed test
388+
// cases so the version information related to Python was also added.
389+
val clue = testCase match {
390+
case udfTestCase: UDFTest
391+
if udfTestCase.udf.isInstanceOf[TestPythonUDF] && shouldTestPythonUDFs =>
392+
s"${testCase.name}${System.lineSeparator()}Python: $pythonVer${System.lineSeparator()}"
393+
case udfTestCase: UDFTest
394+
if udfTestCase.udf.isInstanceOf[TestScalarPandasUDF] && shouldTestScalarPandasUDFs =>
395+
s"${testCase.name}${System.lineSeparator()}" +
396+
s"Python: $pythonVer Pandas: $pandasVer PyArrow: $pyarrowVer${System.lineSeparator()}"
397+
case _ =>
398+
s"${testCase.name}${System.lineSeparator()}"
399+
}
400+
401+
withClue(clue) {
388402
// Read back the golden file.
389403
val expectedOutputs: Seq[QueryOutput] = {
390404
val goldenOutput = fileToString(new File(testCase.resultFile))

0 commit comments

Comments
 (0)