Skip to content

Conversation

@dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Feb 26, 2025

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

TODO

@dangotbanned dangotbanned added the enhancement New feature or request label Feb 26, 2025
Comment on lines +25 to +29
def test_iter_columns(constructor_eager: ConstructorEager) -> None:
df = nw.from_native(constructor_eager(data), eager_only=True)
expected = df.to_dict(as_series=True)
result = {series.name: series for series in df.iter_columns()}
assert result == expected
Copy link
Member Author

@dangotbanned dangotbanned Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the test here, following the lead of iter_rows:

def test_iter_rows(
request: Any,
constructor_eager: ConstructorEager,
named: bool, # noqa: FBT001
expected: list[tuple[Any, ...]] | list[dict[str, Any]],
) -> None:
if "cudf" in str(constructor_eager):
request.applymarker(pytest.mark.xfail)
data = {"a": [1, 3, 2], "_b": [4, 4, 6], "z": [7.0, 8.0, 9.0], "1": [5, 6, 7]}
df = nw.from_native(constructor_eager(data), eager_only=True)
result = list(df.iter_rows(named=named))
assert result == expected

Not sure if we want anything more complex?
Maybe some consideration for a roundtrip through pyarrow and make sure the name is preserved?

return self._native_frame.to_dict(orient="records")

def iter_columns(self) -> Iterator[PandasLikeSeries]:
for _name, series in self._native_frame.items(): # noqa: PERF102
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See (#2064 (comment)) regarding false-positive (PERF102)

"""
return self._compliant_frame.rows(named=named) # type: ignore[no-any-return]

def iter_columns(self: Self) -> Iterator[Series[Any]]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iter_columns and all other methods on DataFrame that include Series[Any] in their return type (e.g get_column) could benefit from a change I have in #2064

class CompliantDataFrame(Generic[CompliantSeriesT_co], Protocol):
def __narwhals_dataframe__(self) -> Self: ...
def __narwhals_namespace__(self) -> Any: ...
def simple_select(
self, *column_names: str
) -> Self: ... # `select` where all args are column names.
def aggregate(self, *exprs: Any) -> Self:
... # `select` where all args are aggregations or literals
# (so, no broadcasting is necessary).
@property
def columns(self) -> Sequence[str]: ...
@property
def schema(self) -> Mapping[str, DType]: ...
def get_column(self, name: str) -> CompliantSeriesT_co: ...

On the public side, it would mean adding:

# narwhals.typing.py
SeriesT_co = TypeVar("SeriesT_co", bound="Series[Any]", covariant=True)

And making changes like this:

# narwhals.dataframe.py
from narwhals.typing import SeriesT_co

class DataFrame(BaseFrame[DataFrameT], Generic[SeriesT_co]):
    def get_column(self: Self, name: str) -> SeriesT_co: ...
    def iter_columns(self: Self) -> Iterator[SeriesT_co]: ...
    ...

So then you'd have things like:

DataFrame[pd.DataFrame, pd.Series]
DataFrame[pl.DataFrame, pl.Series]
DataFrame[pa.Table, pa.ChunkedArray]

The link between DataFrame and Series is something I've explored some more in #2055 also

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the v1 backport merges - it might make this process easier.
We wouldn't have the complexity of stable/unstable Series not having a TypeVar free

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would ❤️ love ❤️ to have this

@dangotbanned dangotbanned marked this pull request as ready for review February 26, 2025 16:41
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

I wonder - should we just implement this at the narwhals/dataframe.py level and just return

yield from (self[col] for col in self.columns)

?

This is only defined for eager frames anyway so we don't need to worry about invoking self.columns (and Polars does that anyway for this method)

@MarcoGorelli
Copy link
Member

ah wait sorry, just remembered that for this feature you worked up to it the other way round (starting at the compliant level, then deciding to expose it)

sure, happy to add it like this then 👍

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @dangotbanned !

@dangotbanned dangotbanned merged commit 77a0150 into main Feb 27, 2025
27 of 28 checks passed
@dangotbanned dangotbanned deleted the df-iter-columns branch February 27, 2025 22:27
dangotbanned added a commit that referenced this pull request Feb 28, 2025
- Originally in #2064
- Will eventually support #2104 (comment)
MarcoGorelli pushed a commit that referenced this pull request Mar 1, 2025
* feat(typing): Make `CompliantDataFrame` generic

- Originally in #2064
- Will eventually support #2104 (comment)

* chore(typing): Update eager backends

* fix: Make `PolarsSeries` compliant

- Now for `PolarsDataFrame` to be compliant, `PolarsSeries.alias` needs to be present
- Since `PolarsDataFrame` can be returned in all of these places, they all required the update to `_polars`

* fix(typing): Resolve new `mypy` errors

Originally c11dc95
dangotbanned added a commit that referenced this pull request Mar 1, 2025
dangotbanned added a commit that referenced this pull request Mar 8, 2025
Resolves: (#2149 (comment))

Doing this properly surfaced lots of issues
- `Polars*` classes are *similar* to `Compliant*`
  - Not close enough to support typing
  - Lots of stuff is behind `__getattr__`
  - We don't need `PolarsExpr` to do a lot of the work `CompliantExpr` does
- `nw.Series._compliant_series` is still a big PR away
  - Repeat of (#2119)
  - Last big hurdle to get (#2104 (comment))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add DataFrame.iter_columns

3 participants