Forward port of Mario's PySpark PRs #19

evertlammerts · 2025-08-28T13:28:38Z

This PR is a forward port of (parts of) duckdb/duckdb#15462 and duckdb/duckdb#15036 by @mariotaddeucci with a couple of omissions after a careful review:

explode() is mapped to unnest but that has significantly different semantics:
- explode() on a map creates two new columns key and value <-> unnest() creates new columns for each key.
- explode() on a list creates a new column col <-> unnest() creates a column named "unnest()". This is probably fixable though
count_if() is problematic (and count() is as well). The examples in the docstring don't work and right now can't work, afaict, because select creates a projection and for some reason we can't aggregate on that (e.g. this throws an error: df.select(sf.count('b'), sf.count_if('b')).show()). Before we put this in we'd need to figure out a way to fix that, otherwise the semantics aren't even close to those of Spark.
every() suffers from the same problem. E.g. this example from the docstring throws the same exception: spark.createDataFrame([[False], [False], [False]], ["flag"]).select(sf.every("flag")).

Forward port of duckdb/duckdb#15462 and duckdb/duckdb#15036

7649e2a

This was referenced Aug 28, 2025

[PySpark] - Add date_diff and explode to pyspark functions duckdb/duckdb#15036

Closed

[PySpark] - Add negative, count_if, try_to_timestamp, equal_null and every to pyspark functions duckdb/duckdb#15462

Closed

evertlammerts self-assigned this Aug 29, 2025

evertlammerts merged commit 4db459b into duckdb:main Aug 29, 2025
13 checks passed

evertlammerts deleted the pyspark_functions branch August 29, 2025 09:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Forward port of Mario's PySpark PRs #19

Forward port of Mario's PySpark PRs #19

Uh oh!

evertlammerts commented Aug 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Forward port of Mario's PySpark PRs #19

Forward port of Mario's PySpark PRs #19

Uh oh!

Conversation

evertlammerts commented Aug 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant