Skip to content

Conversation

@cpsievert
Copy link
Contributor

@cpsievert cpsievert commented Jan 15, 2026

This PR refactors the data source classes to return native DataFrame types that match the input, rather than always wrapping results in narwhals.DataFrame/narwhals.LazyFrame. That is, if you put pandas/polars in, you (once again) get pandas/polars out. This effectively reverts the breaking change introduced in 0.4.0 and provides a more intuitive API.

Breaking Change

This is technically a breaking change that reverts the 0.4.0 breaking change. Users who adapted to 0.4.0 by working with narwhals.DataFrame will now receive native types again:

Input type Return type (0.3.x) Return type (0.4.0) Return type (this PR)
polars.DataFrame polars.DataFrame narwhals.DataFrame polars.DataFrame
pandas.DataFrame pandas.DataFrame narwhals.DataFrame pandas.DataFrame
pyarrow.Table N/A N/A pyarrow.Table
sqlalchemy.Engine pandas.DataFrame narwhals.DataFrame narwhals.DataFrame

Changes

  • DataSource is now generic (DataSource[T]) with a configurable return type
  • DataFrameSource tracks the input DataFrame library and returns results in the same format
  • PolarsLazySource returns native pl.LazyFrame instead of nw.LazyFrame
  • SQLAlchemySource continues to return nw.DataFrame (no clear "native" type to preserve)
  • All QueryChat classes are now generic with @overload signatures for proper type narrowing
  • Added pyarrow support for DataFrameSource
  • Removed duckdb_result_to_nw() helper (no longer needed)

Motivation

The 0.4.0 change forced users to call .to_native() on every result, adding friction. Returning native types:

  1. Is more intuitive (you get back what you put in)
  2. Enables better type inference in IDEs
  3. Reduces boilerplate for downstream code

Changes DataSource.get_data() and .execute_query() to return native
DataFrame types (pd.DataFrame, pl.DataFrame, pl.LazyFrame) instead of
narwhals wrappers, providing better IDE support and type inference.

Key changes:
- DataFrameSource now returns polars/pandas DataFrames matching input type
- PolarsLazySource returns native pl.LazyFrame from execute_query()
- Added IntoDataFrameT TypeVar for generic type capture in QueryChat
- Updated all QueryChat subclasses with 3 overloads for type narrowing:
  - pl.LazyFrame → QueryChat[pl.LazyFrame]
  - IntoDataFrameT → QueryChat[IntoDataFrameT] (any DataFrame type)
  - sqlalchemy.Engine → QueryChat[nw.DataFrame]
- Updated df_to_html() to convert native types back to narwhals
- Fixed tests to use native DataFrame APIs (.tolist(), .iloc[])

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…f_lib

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add DataSourceT TypeVar to DataSource for generic return type
- DataFrameSource uses IntoDataFrameT to preserve input DataFrame type
- PolarsLazySource returns pl.LazyFrame (test_query validates then returns lazy)
- SQLAlchemySource adds return_type param ("polars"|"pandas", default "polars")
- Add read_sql_polars/read_sql_pandas helpers in _df_compat.py
- Update tests for new return types

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
SQLAlchemySource now uses overloaded __init__ to narrow return type based
on the return_type parameter:
- return_type="polars" (default) -> SQLAlchemySource[pl.DataFrame]
- return_type="pandas" -> SQLAlchemySource[pd.DataFrame]

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
SQLAlchemySource now supports return_type="pyarrow" which returns
pa.Table from queries. The three options are:
- "polars" (default) -> pl.DataFrame
- "pandas" -> pd.DataFrame
- "pyarrow" -> pa.Table

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ignores

- StateDictAccessorMixin is now generic over the DataFrame type
- Dash and Gradio QueryChat classes use StateDictAccessorMixin[DataFrameT]
- Removed redundant df() overrides and type ignore comments
- The mixin's df() method now properly returns the generic type

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…type ignores

- mod_server now returns ServerValues[_T] where _T is the data source's type
- Removed type: ignore[arg-type] from ServerValues return
- Removed type: ignore[misc] from super().__init__() calls in _shiny.py and _streamlit.py
- Removed type: ignore[return-value] from mod_server calls and df() method

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The function is no longer used since DataFrameSource now uses
duckdb_result_to_polars and duckdb_result_to_pandas directly.
Updated tests to use duckdb_result_to_polars.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove duckdb_result_to_polars and duckdb_result_to_pandas functions
as thin abstractions that don't add value. Inline the .pl() and .df()
calls directly in PolarsLazySource.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace hardcoded pl.LazyFrame with IntoLazyFrameT TypeVar in overloads
to make lazy frame handling more generic and consistent with the
IntoDataFrameT pattern.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove DataOrLazyFrame legacy alias
- Remove quotes from TypeVar bounds (not needed in TYPE_CHECKING block)
- Remove inline comments from _df_lib field
- Add _convert_result helper that supports polars, pandas, and pyarrow
- Throw error for unsupported DataFrame backends

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove return_type parameter from SQLAlchemySource and just return
nw.DataFrame. The read_sql function tries polars first, then falls
back to pandas based on available libraries.

- Remove overloads and return_type parameter
- Use DataSource[nw.DataFrame] instead of DataSource[DataFrameT]
- Remove read_sql_polars, read_sql_pandas, read_sql_pyarrow functions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update tests to expect native DataFrames (pl.DataFrame) instead of nw.DataFrame
- Remove unsupported "Type Parameters" docstring section for quartodoc compatibility

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove stale DataOrLazyFrame import from _querychat_core.py
- Add type: ignore comments for super().__init__ calls with generic type params
- Add type: ignore for _convert_result return statements in DataFrameSource
- Add type: ignore for attribute access on generic DataFrame types

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
DataFrames returned by df() are now native types (polars, pandas), not
narwhals wrappers, so to_native() is no longer needed and would fail.

- Update 07-gradio-custom-app.py example
- Update _gradio.py update_displays function

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove all type ignores added on this branch by:
  - Removing overloads from QueryChatBase (subclasses have their own)
  - Using cast() and nw.from_native() for DataFrame operations
  - Bounding TypeVars to IntoFrame for better type inference

- Use narwhals TypeVars directly instead of custom definitions:
  - IntoFrameT (replaces DataFrameT) - bound to IntoFrame
  - IntoDataFrameT - bound to IntoDataFrame
  - IntoLazyFrameT - bound to IntoLazyFrame

- Standardize internal DataFrame operations with nw.from_native()
  for uniform access to .shape, .to_pandas(), .to_native(), etc.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove DataSourceT TypeVar, use IntoFrameT directly
- Simplify IntoDataFrame | IntoLazyFrame to IntoFrame in signatures
- Remove implementation note from QueryChatBase docstring
- Change get_current_data() return type from Any to IntoFrame

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove comments that simply describe what the code does when the code
is already self-explanatory (e.g., "Wrap in narwhals for uniform
DataFrame operations" before nw.from_native() calls).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@cpsievert
Copy link
Contributor Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- Move IntoFrame import to TYPE_CHECKING block
- Add __all__ to _datasource.py for explicit re-exports
- Fix import sorting (ruff auto-fix)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Instead of re-exporting IntoFrameT, IntoDataFrameT, IntoLazyFrameT
from _datasource.py, have each module import directly from
narwhals.stable.v1.typing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add TestDataFrameSourceWithPyArrow test class to verify that
DataFrameSource correctly handles pyarrow.Table inputs and returns
native pyarrow.Table results.

Also add pyarrow to dev dependencies and simplify test imports by
removing conditional checks (polars and pyarrow are now guaranteed
dev dependencies).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@cpsievert cpsievert merged commit 1d79086 into main Jan 16, 2026
7 checks passed
@cpsievert cpsievert deleted the feat/native-dataframe-types branch January 16, 2026 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants