-
Notifications
You must be signed in to change notification settings - Fork 8
Add polars/narwhals support to the formula interface #502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
MarcAntoineSchmidtQC
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look's good. I tested the speed and nw.from_native is always extremely fast, although faster for polars. But this is not going to be relevant in any of our benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for polars and other dataframes supported by narwhals to the formula interface, building on previous work. It updates the Python requirement from 3.9 to 3.10 and upgrades the formulaic dependency from 0.6 to 1.2.
Changes:
- Integrated narwhals for generic dataframe support in
TabmatMaterializerandencode_contrasts - Refactored
_InteractableCategoricalVector.from_categoricaltofrom_codesfor framework-agnostic categorical handling - Added comprehensive test parametrization to run all formula tests with both pandas and polars dataframes
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_formula.py | Parametrized all formula tests to run with both pandas and polars; updated fixtures to support multiple input types; updated categorical vector creation to use new from_codes API |
| src/tabmat/formula.py | Added narwhals integration to TabmatMaterializer; refactored categorical handling to use codes/categories instead of pandas-specific objects; updated _C and encode_contrasts functions for generic dataframe support |
| src/tabmat/categorical_matrix.py | Added support for polars Enum types in _extract_codes_and_categories_polars |
| src/tabmat/benchmark/generate_matrices.py | Added type ignore comment for type checking |
| setup.py | Removed Python 3.9 support, updated formulaic requirement to >=1.2, bumped minimum Python to 3.10 |
| pyproject.toml | Updated mypy python_version from 3.9 to 3.10 |
| pixi.toml | Updated Python requirements, formulaic version, polars version, removed py39 environment |
| conda.recipe/meta.yaml | Updated formulaic requirement to >=1.2 |
| CHANGELOG.rst | Documented new narwhals/polars support and Python 3.10 requirement |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This PR follows up on #370 and #388 by adding generic dataframs support for the formula interface.
Checklist
CHANGELOG.rstentry