Skip to content

Commit 1d79086

Browse files
cpsievertclaude
andauthored
feat(pkg-py): return native DataFrame types with generic type narrowing (#196)
* feat(pkg-py): return native DataFrame types with generic type narrowing Changes DataSource.get_data() and .execute_query() to return native DataFrame types (pd.DataFrame, pl.DataFrame, pl.LazyFrame) instead of narwhals wrappers, providing better IDE support and type inference. Key changes: - DataFrameSource now returns polars/pandas DataFrames matching input type - PolarsLazySource returns native pl.LazyFrame from execute_query() - Added IntoDataFrameT TypeVar for generic type capture in QueryChat - Updated all QueryChat subclasses with 3 overloads for type narrowing: - pl.LazyFrame → QueryChat[pl.LazyFrame] - IntoDataFrameT → QueryChat[IntoDataFrameT] (any DataFrame type) - sqlalchemy.Engine → QueryChat[nw.DataFrame] - Updated df_to_html() to convert native types back to narwhals - Fixed tests to use native DataFrame APIs (.tolist(), .iloc[]) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: address PR feedback - remove comment, rename _backend to _df_lib Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(pkg-py): make DataSource generic with configurable return types - Add DataSourceT TypeVar to DataSource for generic return type - DataFrameSource uses IntoDataFrameT to preserve input DataFrame type - PolarsLazySource returns pl.LazyFrame (test_query validates then returns lazy) - SQLAlchemySource adds return_type param ("polars"|"pandas", default "polars") - Add read_sql_polars/read_sql_pandas helpers in _df_compat.py - Update tests for new return types Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(pkg-py): use overloads to narrow SQLAlchemySource return type SQLAlchemySource now uses overloaded __init__ to narrow return type based on the return_type parameter: - return_type="polars" (default) -> SQLAlchemySource[pl.DataFrame] - return_type="pandas" -> SQLAlchemySource[pd.DataFrame] Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(pkg-py): add pyarrow return type support for SQLAlchemySource SQLAlchemySource now supports return_type="pyarrow" which returns pa.Table from queries. The three options are: - "polars" (default) -> pl.DataFrame - "pandas" -> pd.DataFrame - "pyarrow" -> pa.Table Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(pkg-py): make StateDictAccessorMixin generic to remove type ignores - StateDictAccessorMixin is now generic over the DataFrame type - Dash and Gradio QueryChat classes use StateDictAccessorMixin[DataFrameT] - Removed redundant df() overrides and type ignore comments - The mixin's df() method now properly returns the generic type Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(pkg-py): add proper generic typing to mod_server and remove type ignores - mod_server now returns ServerValues[_T] where _T is the data source's type - Removed type: ignore[arg-type] from ServerValues return - Removed type: ignore[misc] from super().__init__() calls in _shiny.py and _streamlit.py - Removed type: ignore[return-value] from mod_server calls and df() method Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(pkg-py): remove unused duckdb_result_to_nw function The function is no longer used since DataFrameSource now uses duckdb_result_to_polars and duckdb_result_to_pandas directly. Updated tests to use duckdb_result_to_polars. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(pkg-py): inline duckdb result conversion calls Remove duckdb_result_to_polars and duckdb_result_to_pandas functions as thin abstractions that don't add value. Inline the .pl() and .df() calls directly in PolarsLazySource. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(pkg-py): use IntoLazyFrameT for lazy frame type narrowing Replace hardcoded pl.LazyFrame with IntoLazyFrameT TypeVar in overloads to make lazy frame handling more generic and consistent with the IntoDataFrameT pattern. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(pkg-py): clean up DataFrameSource and support pyarrow - Remove DataOrLazyFrame legacy alias - Remove quotes from TypeVar bounds (not needed in TYPE_CHECKING block) - Remove inline comments from _df_lib field - Add _convert_result helper that supports polars, pandas, and pyarrow - Throw error for unsupported DataFrame backends Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(pkg-py): simplify SQLAlchemySource to return nw.DataFrame Remove return_type parameter from SQLAlchemySource and just return nw.DataFrame. The read_sql function tries polars first, then falls back to pandas based on available libraries. - Remove overloads and return_type parameter - Use DataSource[nw.DataFrame] instead of DataSource[DataFrameT] - Remove read_sql_polars, read_sql_pandas, read_sql_pyarrow functions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(pkg-py): fix CI failures for native DataFrame types - Update tests to expect native DataFrames (pl.DataFrame) instead of nw.DataFrame - Remove unsupported "Type Parameters" docstring section for quartodoc compatibility Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(pkg-py): resolve pyright type errors for generic DataFrames - Remove stale DataOrLazyFrame import from _querychat_core.py - Add type: ignore comments for super().__init__ calls with generic type params - Add type: ignore for _convert_result return statements in DataFrameSource - Add type: ignore for attribute access on generic DataFrame types Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(pkg-py): remove to_native() calls on native DataFrames DataFrames returned by df() are now native types (polars, pandas), not narwhals wrappers, so to_native() is no longer needed and would fail. - Update 07-gradio-custom-app.py example - Update _gradio.py update_displays function Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(pkg-py): remove type ignores and use narwhals TypeVars - Remove all type ignores added on this branch by: - Removing overloads from QueryChatBase (subclasses have their own) - Using cast() and nw.from_native() for DataFrame operations - Bounding TypeVars to IntoFrame for better type inference - Use narwhals TypeVars directly instead of custom definitions: - IntoFrameT (replaces DataFrameT) - bound to IntoFrame - IntoDataFrameT - bound to IntoDataFrame - IntoLazyFrameT - bound to IntoLazyFrame - Standardize internal DataFrame operations with nw.from_native() for uniform access to .shape, .to_pandas(), .to_native(), etc. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(pkg-py): address PR feedback on type annotations - Remove DataSourceT TypeVar, use IntoFrameT directly - Simplify IntoDataFrame | IntoLazyFrame to IntoFrame in signatures - Remove implementation note from QueryChatBase docstring - Change get_current_data() return type from Any to IntoFrame Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore(pkg-py): remove redundant comments Remove comments that simply describe what the code does when the code is already self-explanatory (e.g., "Wrap in narwhals for uniform DataFrame operations" before nw.from_native() calls). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(pkg-py): fix CI formatting issues - Move IntoFrame import to TYPE_CHECKING block - Add __all__ to _datasource.py for explicit re-exports - Fix import sorting (ruff auto-fix) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(pkg-py): import TypeVars directly from narwhals Instead of re-exporting IntoFrameT, IntoDataFrameT, IntoLazyFrameT from _datasource.py, have each module import directly from narwhals.stable.v1.typing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore(pkg-py): remove unnecessary __all__ from _datasource.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update changelog * test(pkg-py): add tests for pyarrow Table support Add TestDataFrameSourceWithPyArrow test class to verify that DataFrameSource correctly handles pyarrow.Table inputs and returns native pyarrow.Table results. Also add pyarrow to dev dependencies and simplify test imports by removing conditional checks (polars and pyarrow are now guaranteed dev dependencies). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(pkg-py): fix import sorting in tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent f01820f commit 1d79086

18 files changed

+482
-243
lines changed

pkg-py/CHANGELOG.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,24 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
### New features
1111

12-
* Added `PolarsLazySource` to support Polars LazyFrames as data sources. Data stays lazy until the render boundary, enabling efficient handling of large datasets. Pass a `polars.LazyFrame` directly to `QueryChat()` and queries will be executed lazily via Polars' SQLContext.
13-
1412
* Added support for Gradio, Dash, and Streamlit web frameworks in addition to Shiny. Import from the new submodules:
1513
* `from querychat.gradio import QueryChat`
1614
* `from querychat.dash import QueryChat`
1715
* `from querychat.streamlit import QueryChat`
1816

19-
Each framework's `QueryChat` provides `.app()` for quick standalone apps and `.ui()` for custom layouts. Install framework dependencies with pip extras: `pip install querychat[gradio]`, `pip install querychat[dash]`, or `pip install querychat[streamlit]`.
17+
Each framework's `QueryChat` provides `.app()` for quick standalone apps and `.ui()` for custom layouts. Install framework dependencies with pip extras: `pip install querychat[gradio]`, `pip install querychat[dash]`, or `pip install querychat[streamlit]`. (#190)
18+
19+
* `QueryChat()` gains support for more data sources:
20+
* `polars.LazyFrame`: queries execute lazily via `polars.SQLContext`. In this case, `.df()` et al. methods will return a `polars.LazyFrame`. (#191)
21+
* `pyarrow.Table`: queries execute in-memory via `duckdb`. In this case, `.df()` et al. methods will return a `pyarrow.Table`. (#196)
22+
23+
### Improvements
24+
25+
* Improved typing support for return types on `.df()` et al. (#196)
26+
27+
### Changes
28+
29+
* `DataFrameSource` methods now (once again) return the input DataFrame type (e.g., `pandas.DataFrame`) instead of `nw.DataFrame`. (#196)
2030

2131
## [0.4.0] - 2026-01-14
2232

pkg-py/examples/07-gradio-custom-app.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,7 @@ def update_display(state_dict: AppStateDict):
4747
sql = qc.sql(state_dict)
4848
title = qc.title(state_dict)
4949

50-
# Convert narwhals DataFrame to native (pandas) for Gradio compatibility
51-
display_df = df.head(100).to_native()
50+
display_df = df.head(100)
5251
return (
5352
f"### {title or 'Full Dataset'}",
5453
sql or "SELECT * FROM titanic",

pkg-py/src/querychat/_dash.py

Lines changed: 53 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,11 @@
22

33
from __future__ import annotations
44

5-
from typing import TYPE_CHECKING, Literal, Optional, cast
5+
from typing import TYPE_CHECKING, Literal, Optional, cast, overload
66

77
import narwhals.stable.v1 as nw
88
from chatlas import Turn
9+
from narwhals.stable.v1.typing import IntoDataFrameT, IntoFrameT, IntoLazyFrameT
910

1011
from ._dash_ui import IDs, card_ui, chat_container_ui, chat_messages_ui
1112
from ._querychat_base import TOOL_GROUPS, QueryChatBase
@@ -31,7 +32,7 @@
3132
from dash import html
3233

3334

34-
class QueryChat(QueryChatBase, StateDictAccessorMixin):
35+
class QueryChat(QueryChatBase[IntoFrameT], StateDictAccessorMixin[IntoFrameT]):
3536
"""
3637
QueryChat for Dash applications.
3738
@@ -86,6 +87,54 @@ def update_sql(state):
8687
8788
"""
8889

90+
@overload
91+
def __init__(
92+
self: QueryChat[IntoLazyFrameT],
93+
data_source: IntoLazyFrameT,
94+
table_name: str,
95+
*,
96+
greeting: Optional[str | PathType] = None,
97+
client: Optional[str | chatlas.Chat] = None,
98+
tools: TOOL_GROUPS | tuple[TOOL_GROUPS, ...] | None = ("update", "query"),
99+
data_description: Optional[str | PathType] = None,
100+
categorical_threshold: int = 20,
101+
extra_instructions: Optional[str | PathType] = None,
102+
prompt_template: Optional[str | PathType] = None,
103+
storage_type: Literal["memory", "session", "local"] = "memory",
104+
) -> None: ...
105+
106+
@overload
107+
def __init__(
108+
self: QueryChat[IntoDataFrameT],
109+
data_source: IntoDataFrameT,
110+
table_name: str,
111+
*,
112+
greeting: Optional[str | PathType] = None,
113+
client: Optional[str | chatlas.Chat] = None,
114+
tools: TOOL_GROUPS | tuple[TOOL_GROUPS, ...] | None = ("update", "query"),
115+
data_description: Optional[str | PathType] = None,
116+
categorical_threshold: int = 20,
117+
extra_instructions: Optional[str | PathType] = None,
118+
prompt_template: Optional[str | PathType] = None,
119+
storage_type: Literal["memory", "session", "local"] = "memory",
120+
) -> None: ...
121+
122+
@overload
123+
def __init__(
124+
self: QueryChat[nw.DataFrame],
125+
data_source: sqlalchemy.Engine,
126+
table_name: str,
127+
*,
128+
greeting: Optional[str | PathType] = None,
129+
client: Optional[str | chatlas.Chat] = None,
130+
tools: TOOL_GROUPS | tuple[TOOL_GROUPS, ...] | None = ("update", "query"),
131+
data_description: Optional[str | PathType] = None,
132+
categorical_threshold: int = 20,
133+
extra_instructions: Optional[str | PathType] = None,
134+
prompt_template: Optional[str | PathType] = None,
135+
storage_type: Literal["memory", "session", "local"] = "memory",
136+
) -> None: ...
137+
89138
def __init__(
90139
self,
91140
data_source: IntoFrame | sqlalchemy.Engine,
@@ -374,8 +423,7 @@ def update_display(state_data: AppStateDict, reset_clicks):
374423
sql_title = state.title or "SQL Query"
375424
sql_code = f"```sql\n{state.get_display_sql()}\n```"
376425

377-
df = state.get_current_data()
378-
# Collect if lazy before accessing .to_pandas() or .shape
426+
df = nw.from_native(state.get_current_data())
379427
if isinstance(df, nw.LazyFrame):
380428
df = df.collect()
381429

@@ -408,8 +456,7 @@ def update_display(state_data: AppStateDict, reset_clicks):
408456
)
409457
def export_csv(n_clicks: int, state_data: AppStateDict):
410458
state = deserialize_state(state_data)
411-
df = state.get_current_data()
412-
# Collect if lazy before converting to pandas
459+
df = nw.from_native(state.get_current_data())
413460
if isinstance(df, nw.LazyFrame):
414461
df = df.collect()
415462
return send_data_frame(df.to_pandas().to_csv, "querychat_data.csv", index=False)

0 commit comments

Comments
 (0)