Skip to content

Commit f7b20da

Browse files
committed
[SPARK-54842][PYTHON][TESTS] Fix test_arrow_udf_chained_iii in Python-Only MacOS26
### What changes were proposed in this pull request? attempt to fix https://github.com/apache/spark/actions/runs/20495264978/job/58904792835, ``` Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/Users/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 3511, in main process() File "/Users/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 3502, in process serializer.dump_stream(out_iter, outfile) File "/Users/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 781, in dump_stream return ArrowStreamSerializer.dump_stream(self, wrap_and_init_stream(), stream) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 120, in dump_stream for batch in iterator: File "/Users/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 765, in wrap_and_init_stream for packed in iterator: File "/Users/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 2954, in func for result_batch, result_type in result_iter: File "/Users/runner/work/spark/spark/python/pyspark/sql/tests/arrow/test_arrow_udf_scalar.py", line 930, in <lambda> lambda it: map(lambda x: pa.compute.subtract(x, 1), it), ^^^^^^^^^^ AttributeError: module 'pyarrow' has no attribute 'compute' ``` This test passed before on macos26; The parity test on spark connect pass; ### Why are the changes needed? I suspect there is a cloudpickle pitfall when dealing with complicated nested lambdas, I remember I resolved a similar issue by changing the import. ### Does this PR introduce _any_ user-facing change? no, test-only ### How was this patch tested? cannot reproduce this issue locally, will monitor the CI ### Was this patch authored or co-authored using generative AI tooling? no Closes #53607 from zhengruifeng/fix_test_arrow_udf_chained_iii. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
1 parent 3cc3cc1 commit f7b20da

File tree

1 file changed

+7
-6
lines changed

1 file changed

+7
-6
lines changed

python/pyspark/sql/tests/arrow/test_arrow_udf_scalar.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -916,26 +916,27 @@ def test_arrow_udf_chained_ii(self):
916916

917917
def test_arrow_udf_chained_iii(self):
918918
import pyarrow as pa
919+
import pyarrow.compute as pc
919920

920-
scalar_f = arrow_udf(lambda x: pa.compute.add(x, 1), LongType())
921-
scalar_g = arrow_udf(lambda x: pa.compute.subtract(x, 1), LongType())
922-
scalar_m = arrow_udf(lambda x, y: pa.compute.multiply(x, y), LongType())
921+
scalar_f = arrow_udf(lambda x: pc.add(x, 1), LongType())
922+
scalar_g = arrow_udf(lambda x: pc.subtract(x, 1), LongType())
923+
scalar_m = arrow_udf(lambda x, y: pc.multiply(x, y), LongType())
923924

924925
iter_f = arrow_udf(
925-
lambda it: map(lambda x: pa.compute.add(x, 1), it),
926+
lambda it: map(lambda x: pc.add(x, 1), it),
926927
LongType(),
927928
ArrowUDFType.SCALAR_ITER,
928929
)
929930
iter_g = arrow_udf(
930-
lambda it: map(lambda x: pa.compute.subtract(x, 1), it),
931+
lambda it: map(lambda x: pc.subtract(x, 1), it),
931932
LongType(),
932933
ArrowUDFType.SCALAR_ITER,
933934
)
934935

935936
@arrow_udf(LongType())
936937
def iter_m(it: Iterator[Tuple[pa.Array, pa.Array]]) -> Iterator[pa.Array]:
937938
for a, b in it:
938-
yield pa.compute.multiply(a, b)
939+
yield pc.multiply(a, b)
939940

940941
df = self.spark.range(10)
941942
expected = df.select(((F.col("id") + 1) * (F.col("id") - 1)).alias("res")).collect()

0 commit comments

Comments
 (0)