Narwhals or Ibis support?

This would be a major overhaul, but has there been any consideration to power Evidently tests with [narwhals](https://narwhals-dev.github.io/narwhals/) and/or [ibis](https://ibis-project.org/)?

Possible upsides: Currently Evidently supports pandas and pyspark--but what if it supported
 
A bunch of other dataframe engines

- Daft 
- Dask
- Polars

And even SQL engines

- Postgres
- DuckDB
- ClickHouse
- BigQuery
- Snowflake

I mean, Idk if it'd end up being possible to express the evidently statistical checks using the dataframe API exposed by Ibis and/or Narwhals, but it it is, this would be a HUGE win IMO. Moving compute to the data lakehouse has a bunch of advantages.

1. speed
  1. we ran an evidently test suite on a medium+ size dataset and it took 5 minutes to compute
  2. for single node Python execution, maybe polars would make the tests faster than pandas
2. compute/simplicity--not having to have a beefy instance configured to run Python and transfer a bunch of data to it--also not having to provision/own a PySpark cluster would be very nice (e.g. if you've already got Snowflake, make that do the work)

<img width="1005" height="622" alt="Image" src="https://github.com/user-attachments/assets/3ae347d1-7a31-414d-98a1-9a5cf6cb47be" />

Personal anecdote: At Pattern, our ML Platform team recently decided not to use Evidently, Soda, or GX and instead roll our own library (which I resented, but the reasoning convinced me).

We wanted the ability to run our tests in one of 2 modes:

1. on dataframes
2. OR via SQL in SnowSQL, TrinoSQL, and SparkSQL (our stack is a bit wild)

Our ML pipelines usually:

1. Do some SQL queries on the lakehouse to prep some data
2. Load it into an ML pipeline, then do a bunch of last-mile transforms with pandas/polars
3. Then (optionally postprocess and) write outputs back to the lakehouse

We test the incoming data in [1] using SQL. Then we test [2] and [3] be running assertions against the dataframes. 

On medium+ sized datasets, [2] and [3] can be slow. It'd be cool if moving the compute for the evidently tests to the lakehouse could speed that up (we could use a write-audit-publish pattern where we write the data to the lakehouse and test it there--only "publishing" the data if it passes)

Incidentally, we also wanted to be able to express our tests using YAML or some other non-Python, declarative format. This made it easy to

1. Find tests in our projects (look for the `tests/*.yaml` files)
2. Standardize the way we run tests (not as much variation in ppl's Python code)
  1. Hopefully this helps with readability 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Narwhals or Ibis support? #1670

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Narwhals or Ibis support? #1670

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions