Skip to content

Conversation

@evertlammerts
Copy link
Collaborator

This PR is a forward port of (parts of) duckdb/duckdb#15462 and duckdb/duckdb#15036 by @mariotaddeucci with a couple of omissions after a careful review:

  • explode() is mapped to unnest but that has significantly different semantics:
    • explode() on a map creates two new columns key and value <-> unnest() creates new columns for each key.
    • explode() on a list creates a new column col <-> unnest() creates a column named "unnest()". This is probably fixable though
  • count_if() is problematic (and count() is as well). The examples in the docstring don't work and right now can't work, afaict, because select creates a projection and for some reason we can't aggregate on that (e.g. this throws an error: df.select(sf.count('b'), sf.count_if('b')).show()). Before we put this in we'd need to figure out a way to fix that, otherwise the semantics aren't even close to those of Spark.
  • every() suffers from the same problem. E.g. this example from the docstring throws the same exception: spark.createDataFrame([[False], [False], [False]], ["flag"]).select(sf.every("flag")).

@evertlammerts evertlammerts self-assigned this Aug 29, 2025
@evertlammerts evertlammerts merged commit 4db459b into duckdb:main Aug 29, 2025
13 checks passed
@evertlammerts evertlammerts deleted the pyspark_functions branch August 29, 2025 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant