Skip to content

Commit c863717

Browse files
[SPARK-53358] Improve arrow Python UDTF output type mismatch error message
### What changes were proposed in this pull request? This PR updates the error message when output type mismatch required type for arrow python UDTFs to make it more user friendly. ### Why are the changes needed? Improve error message to make it more actionable. Before this change: ```pyspark.errors.exceptions.base.PySparkRuntimeError: [UDTF_ARROW_TYPE_CONVERSION_ERROR] Cannot convert the output value of the input '[ 0 ]' with type 'struct<x:int>' to the specified return type of the column: 'struct<x: int32>'. Please check if the data types match and try again. ``` After this change: ``` PyArrow UDTF must return an iterator of pyarrow.Table or pyarrow.RecordBatch objects. ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UTs ### Was this patch authored or co-authored using generative AI tooling? No Closes #52103 from allisonwang-db/spark-53358-arrow-udtf-err-msg. Authored-by: Allison Wang <allison.wang@databricks.com> Signed-off-by: Allison Wang <allison.wang@databricks.com>
1 parent dab3464 commit c863717

File tree

2 files changed

+2
-6
lines changed

2 files changed

+2
-6
lines changed

python/pyspark/errors/error-conditions.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1146,7 +1146,7 @@
11461146
},
11471147
"UDTF_ARROW_TYPE_CONVERSION_ERROR": {
11481148
"message": [
1149-
"Cannot convert the output value of the input '<data>' with type '<schema>' to the specified return type of the column: '<arrow_schema>'. Please check if the data types match and try again."
1149+
"PyArrow UDTF must return an iterator of pyarrow.Table or pyarrow.RecordBatch objects."
11501150
]
11511151
},
11521152
"UDTF_CONSTRUCTOR_INVALID_IMPLEMENTS_ANALYZE_METHOD": {

python/pyspark/worker.py

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2066,11 +2066,7 @@ def convert_to_arrow(data: Iterable):
20662066
# Arrow UDTF should only return Arrow types (RecordBatch/Table)
20672067
raise PySparkRuntimeError(
20682068
errorClass="UDTF_ARROW_TYPE_CONVERSION_ERROR",
2069-
messageParameters={
2070-
"data": str(item),
2071-
"schema": return_type.simpleString(),
2072-
"arrow_schema": str(arrow_return_type),
2073-
},
2069+
messageParameters={},
20742070
)
20752071
return batches
20762072

0 commit comments

Comments
 (0)