[SPARK-53854][PYTHON][TESTS] Skip test_collect_time test if pandas or pyarrow are unavailable

dongjoon-hyun · dongjoon-hyun · commit 2a5d03a00a03 · 2025-10-09T00:05:57.000-07:00
### What changes were proposed in this pull request? This PR aims to skip `test_collect_time` test if pandas or pyarrow are unavailable. ### Why are the changes needed? According to `Python 3.14` CI, this seems to be the last error of `pyspark-sql` module due to the missing `pyarrow`. - https://github.com/apache/spark/actions/workflows/build_python_3.14.yml - https://github.com/apache/spark/actions/runs/18363201896/job/52310847550 ``` ====================================================================== ERROR [0.990s]: test_collect_time (pyspark.sql.tests.test_collection.DataFrameCollectionTests.test_collect_time) ---------------------------------------------------------------------- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/sql/pandas/utils.py", line 69, in require_minimum_pyarrow_version import pyarrow ModuleNotFoundError: No module named 'pyarrow' ``` ### Does this PR introduce _any_ user-facing change? No, this is a test case change. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52555 from dongjoon-hyun/SPARK-53854. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
diff --git a/python/pyspark/sql/tests/test_collection.py b/python/pyspark/sql/tests/test_collection.py
@@ -365,6 +365,10 @@ def check_to_local_iterator_not_fully_consumed(self):
                 break
         self.assertEqual(df.take(8), result)
 
+    @unittest.skipIf(
+        not have_pandas or not have_pyarrow,
+        pandas_requirement_message or pyarrow_requirement_message,
+    )
     def test_collect_time(self):
         import pandas as pd