feat: Support `over` expressions more freely, make expressions printable, rewrite internals (travelling pr 🌴 ) #3152

MarcoGorelli · 2025-09-25T10:00:34Z

I've been on holiday I had a bit of time while travelling recently, so I tried rewriting the internals (yes, again)

The general idea is that narwhals.Expr stores a list of expressions in ._nodes, rather than passing around a bunch of opaque lambdas

What this enables

Expressions are pretty-printable

For example:

In [2]: nw.col('a').abs().rank()
Out[2]: col(a).abs().rank(method=average, descending=False)

This is still quite basic, but we could make it more complex by introducing line breaks if it gets too long. Seems like the kind of thing @camriddell might be interested in?

We can do simple expression rewrites

Currently, (nw.col('a').mean() + 1).over('b') isn't supported:

for pandas-like, it's a non-elementary operation in over
for sql, (mean(a) + 1) over (partition by b) isn't valid syntax, it should be mean(a) over (partition by b) + 1

With this PR, however, it is!

What we do here is, when inserting an over node, we push it down before any elementwise operations (such as +, .abs(), sum_horizontal, ...) and apply it to all expressions. There's some more details in the expansion to "how it works"

So now, expressions like (nw.col('a').mean() + 1).over('b') can be supported for all backends

This rewrite is extremely simple and cheap, it's just a matter of inserting a node at some position i rather than at the end of a list. In general, query optimisation is out of scope for Narwhals. But, given that this enables more of Polars' flexibility for other backends, I think this can be in scope.

Per-group broadcasting

Previously, nw.col('a') - nw.col('a').mean() would be fine, but (nw.col('a') - nw.col('a').mean()).over('b') would raise for sql-like backends. Now, it works fine across all backends! Really useful for feature engineering

Simplified internals

We can completely get rid of depth, function_name, scalar_kwargs
Replace CompliantWhen / CompliantThen and their complicated interaction with just CompliantNamespace.when_then

In fact, this goes as far as reducing package size by almost 1%. Not that that was the objective with this work, but it's nice to see that it doesn't make the package bigger

What this may open the doors to

serialisation / deserialisation of expressions. e.g. nw.Expr.from_json(expr.to_json())
chained window functions, like nw.col('a').shift(7).rolling_mean(7).over('store', order_by='date')
non-elementary group-by aggregations for pandas/pyarrow, like df.group_by('a').agg((nw.col('b')-nw.col('c')).mean())

Benchmarks

I'm not seeing any meaningful impact on performance

perf: Use linked list for `ExprMetadata`

dangotbanned · 2025-10-15T12:09:54Z

narwhals/_compliant/expr.py

+    # This should be set with extreme care, only in `_expression_parsing.py`,
+    # and never from within any compliant class.
+    _opt_metadata: ExprMetadata | None = None
+
+    @property
+    def _metadata(self) -> ExprMetadata:
+        assert self._opt_metadata is not None  # noqa: S101
+        return self._opt_metadata


Making this a property that always returns ExprMetadata is a step in the right direction 👍

Have you thought about formalising this invariant more than just using None and assert?
For example, if we could write something like:

ExprMetadata.UNSET # patent-pending

Then this sentinel could raise an informative error whenever it is used.

IIRC, you could run into that assertion if you try to construct certain kinds of expressions at the compliant-level, since they don't have the same checking/propagation as the narwhals-level.

For the benefit of typing the attribute/property would always return ExprMetadata and never None.
But in the case where we have written something incorrectly, at runtime we would get an error saying:

Hey dude, clippy here, looks like you forgot to do the thing in this way? Maybe try this instead? 😉

sure, i've added a develop-facing error message, thanks (I think I prefer None to UNSET though)

… nodes-rewrite

MarcoGorelli · 2025-10-20T08:37:02Z

Unless there's objections, I'll do a careful read-though of everything and then will plan to ship this by the end of the week

… nodes-rewrite

feat: Make expressions printable, rewrite internals

b4141af

MarcoGorelli force-pushed the nodes-rewrite branch from 4feae2f to b4141af Compare September 25, 2025 13:05

MarcoGorelli added 21 commits September 25, 2025 20:03

coverage

0cf73ca

typing

52f978e

coverage

f6ce196

typing again

6169b23

revert accidental change

c31c5e9

skip old polars

ed29d1c

old vs

f90c13b

fix dataframe to numpy

2943784

document ExprNode

dea9e3e

safer col, fix typing

4048ae6

🎨

906f7fb

exclude too

a457bf0

typing

f29d8ad

mypy

11890a9

remove unnecessary check

07ed5ee

wait how tf doe thi work

06eafaa

grossly simplify broadcast

53c048f

simplify

1bc6c95

Merge remote-tracking branch 'upstream/main' into nodes-rewrite

5784048

cov

c2209d2

post merge fixup

6810813

MarcoGorelli marked this pull request as ready for review September 27, 2025 13:38

MarcoGorelli changed the title ~~WIP feat: Make expressions printable, rewrite internals (travelling pr 🌴 )~~ feat: Make expressions printable, rewrite internals (travelling pr 🌴 ) Sep 27, 2025

MarcoGorelli added 5 commits September 27, 2025 15:04

even simpler!

48a9dfd

assign variable

5ba10ed

replace/replace_all typing

8cde5d2

yay remove type ignore

19a5c99

wooah we can support per-group broadcasting

b758710

FBruzzesi and others added 9 commits October 11, 2025 17:24

Merge branch 'nodes-rewrite' into experimental/linked-list

680926f

no cover iter_nodes

52dd7f6

rm iter_nodes method

e8ab3f6

Merge pull request #7 from narwhals-dev/experimental/linked-list

322354b

perf: Use linked list for `ExprMetadata`

Merge remote-tracking branch 'upstream/main' into nodes-rewrite

baa4702

pass prev to combine_metadata

3ff84ac

Merge remote-tracking branch 'upstream/main' into nodes-rewrite

7443955

fixup

b0a78e3

Merge branch 'main' into nodes-rewrite

d9a30b9

dangotbanned reviewed Oct 15, 2025

View reviewed changes

MarcoGorelli added 15 commits October 16, 2025 10:42

Merge remote-tracking branch 'upstream/main' into nodes-rewrite

a353940

ceil, floor

1962d53

split out _with_node into _with_over_node and _append_node

eb1f74e

simplify ExprMetadata.from_node

fd87898

simplify further

0840868

clearer names

a2dbd2e

Merge branch 'nodes-rewrite' of github.com:MarcoGorelli/narwhals into…

95c9a66

… nodes-rewrite

dask fixup

c9c46c0

typing

33e3078

raise developer-facing assertionerror in _metadata

90468de

cvg

a657cbb

correctly respect arguments metadata in with_filtration, add test

0592804

mark filter not implemented for dask

5d150f3

fixup

739e0d5

keep filter in CompliantSeries for now

34229c1

MarcoGorelli mentioned this pull request Oct 17, 2025

api: move filter from CompliantSeries to CompliantExpr #3216

Open

Merge branch 'main' into nodes-rewrite

f52335c

MarcoGorelli added 2 commits October 21, 2025 12:12

Merge remote-tracking branch 'upstream/main' into nodes-rewrite

b582e92

Merge branch 'nodes-rewrite' of github.com:MarcoGorelli/narwhals into…

081e8c7

… nodes-rewrite

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Support `over` expressions more freely, make expressions printable, rewrite internals (travelling pr 🌴 ) #3152

feat: Support `over` expressions more freely, make expressions printable, rewrite internals (travelling pr 🌴 ) #3152

Uh oh!

MarcoGorelli commented Sep 25, 2025 •

edited

Loading

Uh oh!

dangotbanned Oct 15, 2025

Uh oh!

MarcoGorelli Oct 16, 2025

Uh oh!

MarcoGorelli commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Support over expressions more freely, make expressions printable, rewrite internals (travelling pr 🌴 ) #3152

Are you sure you want to change the base?

feat: Support over expressions more freely, make expressions printable, rewrite internals (travelling pr 🌴 ) #3152

Uh oh!

Conversation

MarcoGorelli commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this enables

Expressions are pretty-printable

We can do simple expression rewrites

Per-group broadcasting

Simplified internals

What this may open the doors to

Benchmarks

Uh oh!

dangotbanned Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Support `over` expressions more freely, make expressions printable, rewrite internals (travelling pr 🌴 ) #3152

feat: Support `over` expressions more freely, make expressions printable, rewrite internals (travelling pr 🌴 ) #3152

MarcoGorelli commented Sep 25, 2025 •

edited

Loading