-
Notifications
You must be signed in to change notification settings - Fork 170
feat: Adds DataFrame.iter_columns
#2104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Will add examples after tests - Adapted from https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.iter_columns.html Will close #2101
| def test_iter_columns(constructor_eager: ConstructorEager) -> None: | ||
| df = nw.from_native(constructor_eager(data), eager_only=True) | ||
| expected = df.to_dict(as_series=True) | ||
| result = {series.name: series for series in df.iter_columns()} | ||
| assert result == expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put the test here, following the lead of iter_rows:
narwhals/tests/frame/rows_test.py
Lines 32 to 44 in b362e46
| def test_iter_rows( | |
| request: Any, | |
| constructor_eager: ConstructorEager, | |
| named: bool, # noqa: FBT001 | |
| expected: list[tuple[Any, ...]] | list[dict[str, Any]], | |
| ) -> None: | |
| if "cudf" in str(constructor_eager): | |
| request.applymarker(pytest.mark.xfail) | |
| data = {"a": [1, 3, 2], "_b": [4, 4, 6], "z": [7.0, 8.0, 9.0], "1": [5, 6, 7]} | |
| df = nw.from_native(constructor_eager(data), eager_only=True) | |
| result = list(df.iter_rows(named=named)) | |
| assert result == expected |
Not sure if we want anything more complex?
Maybe some consideration for a roundtrip through pyarrow and make sure the name is preserved?
Resolves #2104 (comment)
| return self._native_frame.to_dict(orient="records") | ||
|
|
||
| def iter_columns(self) -> Iterator[PandasLikeSeries]: | ||
| for _name, series in self._native_frame.items(): # noqa: PERF102 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See (#2064 (comment)) regarding false-positive (PERF102)
| """ | ||
| return self._compliant_frame.rows(named=named) # type: ignore[no-any-return] | ||
|
|
||
| def iter_columns(self: Self) -> Iterator[Series[Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iter_columns and all other methods on DataFrame that include Series[Any] in their return type (e.g get_column) could benefit from a change I have in #2064
Lines 72 to 86 in 1abc05a
| class CompliantDataFrame(Generic[CompliantSeriesT_co], Protocol): | |
| def __narwhals_dataframe__(self) -> Self: ... | |
| def __narwhals_namespace__(self) -> Any: ... | |
| def simple_select( | |
| self, *column_names: str | |
| ) -> Self: ... # `select` where all args are column names. | |
| def aggregate(self, *exprs: Any) -> Self: | |
| ... # `select` where all args are aggregations or literals | |
| # (so, no broadcasting is necessary). | |
| @property | |
| def columns(self) -> Sequence[str]: ... | |
| @property | |
| def schema(self) -> Mapping[str, DType]: ... | |
| def get_column(self, name: str) -> CompliantSeriesT_co: ... |
On the public side, it would mean adding:
# narwhals.typing.py
SeriesT_co = TypeVar("SeriesT_co", bound="Series[Any]", covariant=True)And making changes like this:
# narwhals.dataframe.py
from narwhals.typing import SeriesT_co
class DataFrame(BaseFrame[DataFrameT], Generic[SeriesT_co]):
def get_column(self: Self, name: str) -> SeriesT_co: ...
def iter_columns(self: Self) -> Iterator[SeriesT_co]: ...
...So then you'd have things like:
DataFrame[pd.DataFrame, pd.Series]
DataFrame[pl.DataFrame, pl.Series]
DataFrame[pa.Table, pa.ChunkedArray]The link between DataFrame and Series is something I've explored some more in #2055 also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the v1 backport merges - it might make this process easier.
We wouldn't have the complexity of stable/unstable Series not having a TypeVar free
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would ❤️ love ❤️ to have this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
I wonder - should we just implement this at the narwhals/dataframe.py level and just return
yield from (self[col] for col in self.columns)?
This is only defined for eager frames anyway so we don't need to worry about invoking self.columns (and Polars does that anyway for this method)
|
ah wait sorry, just remembered that for this feature you worked up to it the other way round (starting at the compliant level, then deciding to expose it) sure, happy to add it like this then 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @dangotbanned !
- Originally in #2064 - Will eventually support #2104 (comment)
* feat(typing): Make `CompliantDataFrame` generic - Originally in #2064 - Will eventually support #2104 (comment) * chore(typing): Update eager backends * fix: Make `PolarsSeries` compliant - Now for `PolarsDataFrame` to be compliant, `PolarsSeries.alias` needs to be present - Since `PolarsDataFrame` can be returned in all of these places, they all required the update to `_polars` * fix(typing): Resolve new `mypy` errors Originally c11dc95
Resolves: (#2149 (comment)) Doing this properly surfaced lots of issues - `Polars*` classes are *similar* to `Compliant*` - Not close enough to support typing - Lots of stuff is behind `__getattr__` - We don't need `PolarsExpr` to do a lot of the work `CompliantExpr` does - `nw.Series._compliant_series` is still a big PR away - Repeat of (#2119) - Last big hurdle to get (#2104 (comment))

What type of PR is this? (check all applicable)
Related issues
DataFrame.iter_columns#2101Checklist
If you have comments or can explain your changes, please do so below
TODO
CompliantDataFrametyping updateDataFrame.iter_columns#2104 (comment))