Commit 1df253b
authored
fix(udf): ensure per-call kwargs in udf v2 are uniquely bound per call site (#6079)
Fix row-wise/batch UDF v2 so that per-call keyword arguments (including
Expression kwargs) are correctly honored and not incorrectly shared
across call sites. Add a regression test that mirrors the reported
`format_number` example using default, literal, and expression
overrides.
The v2 UDF wrapper (`daft.udf.udf_v2.Func.__call__`) used a single
`func_id` derived from the decorated function to identify all UDF
expressions produced by that function. This `func_id` was passed through
to the Rust `row_wise_udf` / `batch_udf` builders and ultimately into
the logical plan as part of `RowWisePyFn` / batch UDF metadata.
Because all logical UDF nodes shared the same `func_id` regardless of
their concrete arguments, they could be treated as the *same* expression
by downstream components (e.g. optimizations, caching, or expression
reuse keyed by this identifier). As a result, multiple calls like:
```python
@daft.func
def format_number(value: int, prefix: str = "$", suffix: str = "") -> str:
return f"{prefix}{value}{suffix}"
format_number(df["amount"])
format_number(df["amount"], prefix="€", suffix=" EUR")
format_number(df["amount"], suffix=df["amount"].cast(daft.DataType.string()))
```
could end up sharing underlying UDF state keyed only by `func_id`, so
that overrides for `prefix` / `suffix` were not reliably respected per
call site.
Introduce a per-call identifier in `Func.__call__` so that each logical
UDF call site is uniquely identified, while still keeping the stable
human-readable name for display:
- Add a monotonically increasing `_daft_call_seq` counter on `Func`
instances.
- For each call that involves Expression arguments, derive a `call_id =
f"{self.func_id}-{call_seq}"`.
- Pass `call_id` instead of `self.func_id` as the `func_id` argument
when constructing the underlying `row_wise_udf` / `batch_udf`
expressions (for generator, batch, and regular row-wise variants).
This keeps the original `name` used for plan display intact, but
guarantees that each distinct call site (with its own bound
`args`/`kwargs`) has a unique function identifier, preventing unintended
sharing across calls.
## Changes Made
<!-- Describe what changes were made and why. Include implementation
details if necessary. -->
## Related Issues
```python
import daft
@daft.func
def format_number(value: int, prefix: str = "$", suffix: str = "") -> str:
return f"{prefix}{value}{suffix}"
df = daft.from_pydict({"amount": [10, 20, 30]})
df = df.with_column("dollar", format_number(df["amount"]))
df = df.with_column("euro", format_number(df["amount"], prefix="€", suffix=" EUR"))
df = df.with_column("customized", format_number(df["amount"], suffix=df["amount"].cast(daft.DataType.string())))
df.show()
```
The result is error:
```
╭────────┬─────────┬─────────┬────────────╮
│ amount ┆ dollar ┆ euro ┆ customized │
│ --- ┆ --- ┆ --- ┆ --- │
│ Int64 ┆ String ┆ String ┆ String │
╞════════╪═════════╪═════════╪════════════╡
│ 10 ┆ €10 EUR ┆ €10 EUR ┆ $1010 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 20 ┆ €20 EUR ┆ €20 EUR ┆ $2020 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 30 ┆ €30 EUR ┆ €30 EUR ┆ $3030 │
╰────────┴─────────┴─────────┴────────────╯
(Showing first 3 of 3 rows)
```
<!-- Link to related GitHub issues, e.g., "Closes #123" -->1 parent 25c189f commit 1df253b
2 files changed
+31
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
230 | 230 | | |
231 | 231 | | |
232 | 232 | | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
233 | 238 | | |
234 | 239 | | |
235 | 240 | | |
| |||
247 | 252 | | |
248 | 253 | | |
249 | 254 | | |
250 | | - | |
| 255 | + | |
251 | 256 | | |
252 | 257 | | |
253 | 258 | | |
| |||
266 | 271 | | |
267 | 272 | | |
268 | 273 | | |
269 | | - | |
| 274 | + | |
270 | 275 | | |
271 | 276 | | |
272 | 277 | | |
| |||
286 | 291 | | |
287 | 292 | | |
288 | 293 | | |
289 | | - | |
| 294 | + | |
290 | 295 | | |
291 | 296 | | |
292 | 297 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
469 | 469 | | |
470 | 470 | | |
471 | 471 | | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
0 commit comments