Skip to content

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Sep 25, 2025

I've been on holiday I had a bit of time while travelling recently, so I tried rewriting the internals (yes, again)

The general idea is that narwhals.Expr stores a list of expressions in ._nodes, rather than passing around a bunch of opaque lambdas

What this enables

Expressions are pretty-printable

For example:

In [2]: nw.col('a').abs().rank()
Out[2]: col(a).abs().rank(method=average, descending=False)

This is still quite basic, but we could make it more complex by introducing line breaks if it gets too long. Seems like the kind of thing @camriddell might be interested in?

We can do simple expression rewrites

Currently, (nw.col('a').mean() + 1).over('b') isn't supported:

  • for pandas-like, it's a non-elementary operation in over
  • for sql, (mean(a) + 1) over (partition by b) isn't valid syntax, it should be mean(a) over (partition by b) + 1

With this PR, however, it is!

What we do here is, when inserting an over node, we push it down before any elementwise operations (such as +, .abs(), sum_horizontal, ...) and apply it to all expressions. There's some more details in the expansion to "how it works"

So now, expressions like (nw.col('a').mean() + 1).over('b') can be supported for all backends

This rewrite is extremely simple and cheap, it's just a matter of inserting a node at some position i rather than at the end of a list. In general, query optimisation is out of scope for Narwhals. But, given that this enables more of Polars' flexibility for other backends, I think this can be in scope.

Per-group broadcasting

Previously, nw.col('a') - nw.col('a').mean() would be fine, but (nw.col('a') - nw.col('a').mean()).over('b') would raise for sql-like backends. Now, it works fine across all backends! Really useful for feature engineering

Simplified internals

  • We can completely get rid of depth, function_name, scalar_kwargs
  • Replace CompliantWhen / CompliantThen and their complicated interaction with just CompliantNamespace.when_then

In fact, this goes as far as reducing package size by almost 1%. Not that that was the objective with this work, but it's nice to see that it doesn't make the package bigger

What this may open the doors to

  • serialisation / deserialisation of expressions. e.g. nw.Expr.from_json(expr.to_json())
  • chained window functions, like nw.col('a').shift(7).rolling_mean(7).over('store', order_by='date')
  • non-elementary group-by aggregations for pandas/pyarrow, like df.group_by('a').agg((nw.col('b')-nw.col('c')).mean())

Benchmarks

I'm not seeing any meaningful impact on performance

@MarcoGorelli MarcoGorelli marked this pull request as ready for review September 27, 2025 13:38
@MarcoGorelli MarcoGorelli changed the title WIP feat: Make expressions printable, rewrite internals (travelling pr 🌴 ) feat: Make expressions printable, rewrite internals (travelling pr 🌴 ) Sep 27, 2025
Comment on lines 92 to 99
# This should be set with extreme care, only in `_expression_parsing.py`,
# and never from within any compliant class.
_opt_metadata: ExprMetadata | None = None

@property
def _metadata(self) -> ExprMetadata:
assert self._opt_metadata is not None # noqa: S101
return self._opt_metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this a property that always returns ExprMetadata is a step in the right direction 👍

Have you thought about formalising this invariant more than just using None and assert?
For example, if we could write something like:

ExprMetadata.UNSET  # patent-pending

Then this sentinel could raise an informative error whenever it is used.

IIRC, you could run into that assertion if you try to construct certain kinds of expressions at the compliant-level, since they don't have the same checking/propagation as the narwhals-level.

For the benefit of typing the attribute/property would always return ExprMetadata and never None.
But in the case where we have written something incorrectly, at runtime we would get an error saying:

Hey dude, clippy here, looks like you forgot to do the thing in this way?
Maybe try this instead? 😉 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, i've added a develop-facing error message, thanks (I think I prefer None to UNSET though)

@MarcoGorelli
Copy link
Member Author

Unless there's objections, I'll do a careful read-though of everything and then will plan to ship this by the end of the week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants