-
Notifications
You must be signed in to change notification settings - Fork 25
feat(pkg-py): return native DataFrame types with generic type narrowing #196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Changes DataSource.get_data() and .execute_query() to return native DataFrame types (pd.DataFrame, pl.DataFrame, pl.LazyFrame) instead of narwhals wrappers, providing better IDE support and type inference. Key changes: - DataFrameSource now returns polars/pandas DataFrames matching input type - PolarsLazySource returns native pl.LazyFrame from execute_query() - Added IntoDataFrameT TypeVar for generic type capture in QueryChat - Updated all QueryChat subclasses with 3 overloads for type narrowing: - pl.LazyFrame → QueryChat[pl.LazyFrame] - IntoDataFrameT → QueryChat[IntoDataFrameT] (any DataFrame type) - sqlalchemy.Engine → QueryChat[nw.DataFrame] - Updated df_to_html() to convert native types back to narwhals - Fixed tests to use native DataFrame APIs (.tolist(), .iloc[]) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
c75ace4 to
827e089
Compare
…f_lib Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add DataSourceT TypeVar to DataSource for generic return type
- DataFrameSource uses IntoDataFrameT to preserve input DataFrame type
- PolarsLazySource returns pl.LazyFrame (test_query validates then returns lazy)
- SQLAlchemySource adds return_type param ("polars"|"pandas", default "polars")
- Add read_sql_polars/read_sql_pandas helpers in _df_compat.py
- Update tests for new return types
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
SQLAlchemySource now uses overloaded __init__ to narrow return type based on the return_type parameter: - return_type="polars" (default) -> SQLAlchemySource[pl.DataFrame] - return_type="pandas" -> SQLAlchemySource[pd.DataFrame] Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
SQLAlchemySource now supports return_type="pyarrow" which returns pa.Table from queries. The three options are: - "polars" (default) -> pl.DataFrame - "pandas" -> pd.DataFrame - "pyarrow" -> pa.Table Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ignores - StateDictAccessorMixin is now generic over the DataFrame type - Dash and Gradio QueryChat classes use StateDictAccessorMixin[DataFrameT] - Removed redundant df() overrides and type ignore comments - The mixin's df() method now properly returns the generic type Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…type ignores - mod_server now returns ServerValues[_T] where _T is the data source's type - Removed type: ignore[arg-type] from ServerValues return - Removed type: ignore[misc] from super().__init__() calls in _shiny.py and _streamlit.py - Removed type: ignore[return-value] from mod_server calls and df() method Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The function is no longer used since DataFrameSource now uses duckdb_result_to_polars and duckdb_result_to_pandas directly. Updated tests to use duckdb_result_to_polars. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove duckdb_result_to_polars and duckdb_result_to_pandas functions as thin abstractions that don't add value. Inline the .pl() and .df() calls directly in PolarsLazySource. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace hardcoded pl.LazyFrame with IntoLazyFrameT TypeVar in overloads to make lazy frame handling more generic and consistent with the IntoDataFrameT pattern. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove DataOrLazyFrame legacy alias - Remove quotes from TypeVar bounds (not needed in TYPE_CHECKING block) - Remove inline comments from _df_lib field - Add _convert_result helper that supports polars, pandas, and pyarrow - Throw error for unsupported DataFrame backends Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove return_type parameter from SQLAlchemySource and just return nw.DataFrame. The read_sql function tries polars first, then falls back to pandas based on available libraries. - Remove overloads and return_type parameter - Use DataSource[nw.DataFrame] instead of DataSource[DataFrameT] - Remove read_sql_polars, read_sql_pandas, read_sql_pyarrow functions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update tests to expect native DataFrames (pl.DataFrame) instead of nw.DataFrame - Remove unsupported "Type Parameters" docstring section for quartodoc compatibility Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove stale DataOrLazyFrame import from _querychat_core.py - Add type: ignore comments for super().__init__ calls with generic type params - Add type: ignore for _convert_result return statements in DataFrameSource - Add type: ignore for attribute access on generic DataFrame types Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
DataFrames returned by df() are now native types (polars, pandas), not narwhals wrappers, so to_native() is no longer needed and would fail. - Update 07-gradio-custom-app.py example - Update _gradio.py update_displays function Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove all type ignores added on this branch by: - Removing overloads from QueryChatBase (subclasses have their own) - Using cast() and nw.from_native() for DataFrame operations - Bounding TypeVars to IntoFrame for better type inference - Use narwhals TypeVars directly instead of custom definitions: - IntoFrameT (replaces DataFrameT) - bound to IntoFrame - IntoDataFrameT - bound to IntoDataFrame - IntoLazyFrameT - bound to IntoLazyFrame - Standardize internal DataFrame operations with nw.from_native() for uniform access to .shape, .to_pandas(), .to_native(), etc. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
cpsievert
commented
Jan 16, 2026
- Remove DataSourceT TypeVar, use IntoFrameT directly - Simplify IntoDataFrame | IntoLazyFrame to IntoFrame in signatures - Remove implementation note from QueryChatBase docstring - Change get_current_data() return type from Any to IntoFrame Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove comments that simply describe what the code does when the code is already self-explanatory (e.g., "Wrap in narwhals for uniform DataFrame operations" before nw.from_native() calls). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Contributor
Author
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code |
- Move IntoFrame import to TYPE_CHECKING block - Add __all__ to _datasource.py for explicit re-exports - Fix import sorting (ruff auto-fix) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Instead of re-exporting IntoFrameT, IntoDataFrameT, IntoLazyFrameT from _datasource.py, have each module import directly from narwhals.stable.v1.typing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
40d9656 to
2df267a
Compare
Add TestDataFrameSourceWithPyArrow test class to verify that DataFrameSource correctly handles pyarrow.Table inputs and returns native pyarrow.Table results. Also add pyarrow to dev dependencies and simplify test imports by removing conditional checks (polars and pyarrow are now guaranteed dev dependencies). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR refactors the data source classes to return native DataFrame types that match the input, rather than always wrapping results in
narwhals.DataFrame/narwhals.LazyFrame. That is, if you put pandas/polars in, you (once again) get pandas/polars out. This effectively reverts the breaking change introduced in 0.4.0 and provides a more intuitive API.Breaking Change
This is technically a breaking change that reverts the 0.4.0 breaking change. Users who adapted to 0.4.0 by working with
narwhals.DataFramewill now receive native types again:polars.DataFramepolars.DataFramenarwhals.DataFramepolars.DataFramepandas.DataFramepandas.DataFramenarwhals.DataFramepandas.DataFramepyarrow.Tablepyarrow.Tablesqlalchemy.Enginepandas.DataFramenarwhals.DataFramenarwhals.DataFrameChanges
DataSourceis now generic (DataSource[T]) with a configurable return typeDataFrameSourcetracks the input DataFrame library and returns results in the same formatPolarsLazySourcereturns nativepl.LazyFrameinstead ofnw.LazyFrameSQLAlchemySourcecontinues to returnnw.DataFrame(no clear "native" type to preserve)QueryChatclasses are now generic with@overloadsignatures for proper type narrowingDataFrameSourceduckdb_result_to_nw()helper (no longer needed)Motivation
The 0.4.0 change forced users to call
.to_native()on every result, adding friction. Returning native types: