refactor: Add `CompliantSeries.from_numpy` #2196

dangotbanned · 2025-03-12T21:14:58Z

What type of PR is this? (check all applicable)

Related issues

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

Technically adding more than just from_numpy
nw._translate.py would be the home for similar protocols
- Been playing with that a lot locally
- This small part seemed ready

All other backends already had these

https://github.com/narwhals-dev/narwhals/actions/runs/13821678447/job/38668299244?pr=2196

- Less repetition, but also helps document what the 2nd `TypeVar` is for (`from_`) - It has to be in that position to follow the rules of https://typing.python.org/en/latest/spec/generics.html#default-ordering-and-subscription-rules

- We've already got the compat handled there - `polars` handles the rest in https://github.com/pola-rs/polars/blob/889a2a7a57be5da432b6fa854ab698bbaf1b02ff/py-polars/polars/series/series.py#L1357-L1399

narwhals/_pandas_like/series.py

dangotbanned · 2025-03-13T16:15:26Z

narwhals/dataframe.py

        if isinstance(arg, Series):
            return arg._compliant_series._to_expr()
        if isinstance(arg, Expr):
-            return arg._to_compliant_expr(self.__narwhals_namespace__())
+            return arg._to_compliant_expr(self.__narwhals_namespace__())  # comment
        if isinstance(arg, str):
            return plx.col(arg)
        if get_polars() is not None and "polars" in str(type(arg)):  # pragma: no cover


I think that starting from here we should be handing off entirely to a CompliantNamespace method.

In addition to simplifying this part - we could reuse that again instead of:

narwhals/narwhals/_expression_parsing.py

Lines 93 to 107 in 7611bd4

def extract_compliant(

plx: CompliantNamespace[CompliantFrameT, CompliantSeriesOrNativeExprT_co],

other: Any,

*,

str_as_lit: bool,

) -> CompliantExpr[CompliantFrameT, CompliantSeriesOrNativeExprT_co] | object:

if is_expr(other):

return other._to_compliant_expr(plx)

if isinstance(other, str) and not str_as_lit:

return plx.col(other)

if is_narwhals_series(other):

return other._compliant_series._to_expr()

if is_numpy_array(other):

return plx._create_compliant_series(other)._to_expr() # type: ignore[attr-defined]

return other

Then everywhere we currently do:

from narwhals._expression_parsing import extract_compliant plx: CompliantNamespace extract_compliant(plx, ...)

Would be something like this:

plx: CompliantNamespace plx._extract_compliant(...)

I had some notes on this locally:

from __future__ import annotations from typing import Any, Protocol class ParseCompliant(Protocol): """Somewhat of an extended [polars._utils.parse.parse_into_expression] Covers cases that are similar, but the latter is narrower: - `nw.dataframe.BaseFrame._extract_compliant` - `nw._expression_parsing.extract_compliant` General - Most usage requires a ref to `__narwhals_namespace__` - Series/BaseFrame can convert internally [polars._utils.parse.parse_into_expression]: https://github.com/pola-rs/polars/blob/9092a0e90005aa98077217f01e725ac4f386a335/py-polars/polars/_utils/parse/expr.py#L20-L63 """ # noqa: D415 def _parse_compliant(self, arg: Any, /) -> Any: ...

dangotbanned · 2025-03-13T16:48:40Z

narwhals/_compliant/series.py

+    @classmethod
+    def from_numpy(cls, data: Into1DArray, /, *, context: _FullContext) -> Self: ...


Linking this back to nw.functions.new_series, we might wanna have this as:

@classmethod def from_numpy( cls, data: Into1DArray, /, *, context: _FullContext, name: str = "", dtype: DType | type[DType] | None = None, ) -> Self: ...

narwhals/narwhals/functions.py

Lines 188 to 194 in 7611bd4

def new_series(

name: str,

values: Any,

dtype: DType | type[DType] | None = None,

*,

native_namespace: ModuleType,

) -> Series[Any]:

dangotbanned · 2025-03-13T16:56:36Z

narwhals/_translate.py

+else:  # pragma: no cover
+    import sys
+    from importlib.util import find_spec
+
+    if sys.version_info >= (3, 13):
+        from typing import TypeVar
+    elif find_spec("typing_extensions"):
+        from typing_extensions import TypeVar
+    else:
+        from typing import TypeVar as _TypeVar
+
+        def TypeVar(  # noqa: ANN202, N802
+            name: str,
+            *constraints: Any,
+            bound: Any | None = None,
+            covariant: bool = False,
+            contravariant: bool = False,
+            **kwds: Any,  # noqa: ARG001
+        ):
+            return _TypeVar(
+                name,
+                *constraints,
+                bound=bound,
+                covariant=covariant,
+                contravariant=contravariant,
+            )


This is a trick to get TypeVar defaults - but in a move reusable way than (#2110 (comment))

refactor: adds _compliant sub-package #2149 (comment)

https://peps.python.org/pep-0696/

I've used them heavily when testing out the .(to|from_) protocols across the rest of the API

Peek

# NOTE: `nw.dataframe.DataFrame` class NarwhalsDataFrame( ArrowConvertible[_ArrowTable, IntoArrowTable], PandasConvertible[_PandasDataFrame], PolarsConvertible[_PolarsDataFrame], NumpyConvertible[_2DArray], NarwhalsDictCovertible, NarwhalsNativeConvertible[NativeDataFrameT], CompliantConvertible["CompliantDataFrame_[NativeDataFrameT, CompliantSeries_T]"], ParseCompliant, Generic[NativeDataFrameT, CompliantSeries_T], ):

dangotbanned · 2025-03-13T16:58:15Z

narwhals/_translate.py

+class NumpyConvertible(
+    ToNumpy[ToNumpyT_co],
+    FromNumpy[FromNumpyDT_contra],
+    Protocol[ToNumpyT_co, FromNumpyDT_contra],
+):
+    def to_numpy(self, dtype: Any, *, copy: bool | None) -> ToNumpyT_co: ...


In relation to (https://github.com/narwhals-dev/narwhals/pull/2196/files#r1993954699) - we could then do stuff like this which uses the same TypeVar twice:

from narwhals._translate import NumpyConvertible from narwhals.typing import _2DArray NumpyConvertible[_2DArray]

I think that's closer to what we'd want on CompliantDataFrame and https://narwhals-dev.github.io/narwhals/api-reference/narwhals/#narwhals.from_numpy

Resolves #2196 (comment)

- Only needs to be the extra stuff - `_create_compliant_series` is removed in #2196

MarcoGorelli · 2025-03-15T12:50:46Z

narwhals/_polars/series.py

+    @classmethod
+    def from_numpy(cls, data: Into1DArray, /, *, context: _FullContext) -> Self:
+        return cls(
+            pl.Series(data if is_numpy_array_1d(data) else [data]),


not sure I follow this condition, where does it come from in the current code?

Ah yeah this might look strange in PolarsSeries.
It seems there isn't an equivalent of ._from_scalar(value=...)

So it is mainly to match the logic of the other backends:

narwhals/narwhals/_arrow/series.py

Lines 141 to 165 in 6a5ed1d

@classmethod

def _from_iterable(

cls: type[Self],

data: Iterable[Any],

name: str,

*,

context: _FullContext,

) -> Self:

return cls(

chunked_array([data]),

name=name,

backend_version=context._backend_version,

version=context._version,

)

def _from_scalar(self, value: Any) -> Self:

if self._backend_version < (13,) and hasattr(value, "as_py"):

value = value.as_py()

return super()._from_scalar(value)

@classmethod

def from_numpy(cls, data: Into1DArray, /, *, context: _FullContext) -> Self:

return cls._from_iterable(

data if is_numpy_array_1d(data) else [data], name="", context=context

)

narwhals/narwhals/_compliant/series.py

Lines 57 to 63 in 6a5ed1d

def _from_scalar(self, value: Any) -> Self:

return self._from_iterable([value], name=self.name, context=self)

@classmethod

def _from_iterable(

cls: type[Self], data: Iterable[Any], name: str, *, context: _FullContext

) -> Self: ...

narwhals/narwhals/_pandas_like/series.py

Lines 175 to 205 in 6a5ed1d

@classmethod

def _from_iterable(

cls: type[Self],

data: Iterable[Any],

name: str,

*,

context: _FullContext,

index: Any = None, # NOTE: Originally a liskov substitution principle violation

) -> Self:

return cls(

native_series_from_iterable(

data,

name=name,

index=[] if index is None else index,

implementation=context._implementation,

),

implementation=context._implementation,

backend_version=context._backend_version,

version=context._version,

)

@classmethod

def from_numpy(cls, data: Into1DArray, /, *, context: _FullContext) -> Self:

implementation = context._implementation

arr = data if is_numpy_array_1d(data) else [data]

return cls(

implementation.to_native_namespace().Series(arr, name=""),

implementation=implementation,

backend_version=context._backend_version,

version=context._version,

)

I imagine we'd probably end up having CompliantSeries with @classmethod's like:

CompliantSeries.from_numpy CompliantSeries.from_iterable CompliantSeries.from_scalar # Maybe more that aren't relevant here

Where they might have overlapping and/or default implementations higher up in the protocol.
E.g. (5d609a7)

ok, i see from the type hint that it's clear actually, we only get here with numpy scalars or numpy 1d arrays

Yeah exactly!

In #2196 (comment), if you click on this

Peek

You can see some similar examples

narwhals/dataframe.py

MarcoGorelli

thanks @dangotbanned !

one comment then good to go

#2196 (comment)

dangotbanned · 2025-03-15T16:36:33Z

@MarcoGorelli this would be the start of putting it all together.

Reusing the same protocol (w/ a different type parameter) for CompliantDataFrame
Making the two (from_numpy) @classmethods(s) accessible on an EagerNamespace instance
Then simplifying a lot of functions.py by just using that after we create our native_namespace

Full diff

CompliantDataFrame will need the same treatment CompliantSeries got in this PR.
But - it'll be a shorter PR for sure

diff --git a/narwhals/_compliant/dataframe.py b/narwhals/_compliant/dataframe.py
index ed1d83b9..c31f2c5b 100644
--- a/narwhals/_compliant/dataframe.py
+++ b/narwhals/_compliant/dataframe.py
@@ -12,6 +12,7 @@ from typing import TypeVar
 from narwhals._compliant.typing import CompliantSeriesT_co
 from narwhals._compliant.typing import EagerSeriesT
 from narwhals._expression_parsing import evaluate_output_names_and_aliases
+from narwhals._translate import NumpyConvertible
 
 if TYPE_CHECKING:
     from typing_extensions import Self
@@ -19,13 +20,14 @@ if TYPE_CHECKING:
 
     from narwhals._compliant.expr import EagerExpr
     from narwhals.dtypes import DType
+    from narwhals.typing import _2DArray  # noqa: F401
 
 __all__ = ["CompliantDataFrame", "CompliantLazyFrame", "EagerDataFrame"]
 
 T = TypeVar("T")
 
 
-class CompliantDataFrame(Protocol[CompliantSeriesT_co]):
+class CompliantDataFrame(NumpyConvertible["_2DArray"], Protocol[CompliantSeriesT_co]):
     def __narwhals_dataframe__(self) -> Self: ...
     def __narwhals_namespace__(self) -> Any: ...
     def simple_select(
diff --git a/narwhals/_compliant/namespace.py b/narwhals/_compliant/namespace.py
index f5449ec4..338306b6 100644
--- a/narwhals/_compliant/namespace.py
+++ b/narwhals/_compliant/namespace.py
@@ -6,13 +6,17 @@ from typing import Any
 from typing import Container
 from typing import Iterable
 from typing import Literal
+from typing import Mapping
 from typing import Protocol
+from typing import Sequence
+from typing import overload
 
 from narwhals._compliant.typing import CompliantExprT
 from narwhals._compliant.typing import CompliantFrameT
 from narwhals._compliant.typing import EagerDataFrameT
 from narwhals._compliant.typing import EagerExprT
 from narwhals._compliant.typing import EagerSeriesT_co
+from narwhals.dependencies import is_numpy_array_2d
 from narwhals.utils import exclude_column_names
 from narwhals.utils import get_column_names
 from narwhals.utils import passthrough_column_names
@@ -20,6 +24,9 @@ from narwhals.utils import passthrough_column_names
 if TYPE_CHECKING:
     from narwhals._compliant.selectors import CompliantSelectorNamespace
     from narwhals.dtypes import DType
+    from narwhals.schema import Schema
+    from narwhals.typing import Into1DArray
+    from narwhals.typing import _2DArray
     from narwhals.utils import Implementation
     from narwhals.utils import Version
 
@@ -84,3 +91,29 @@ class EagerNamespace(
 ):
     @property
     def _series(self) -> type[EagerSeriesT_co]: ...
+    @property
+    def _dataframe(self) -> type[EagerDataFrameT]: ...
+
+    @overload
+    def from_numpy(
+        self,
+        data: Into1DArray,
+        /,
+        schema: None = ...,
+    ) -> EagerSeriesT_co: ...
+    @overload
+    def from_numpy(
+        self,
+        data: _2DArray,
+        /,
+        schema: Mapping[str, DType] | Schema | Sequence[str],
+    ) -> EagerDataFrameT: ...
+    def from_numpy(
+        self,
+        data: Into1DArray | _2DArray,
+        /,
+        schema: Mapping[str, DType] | Schema | Sequence[str] | None = None,
+    ) -> EagerSeriesT_co | EagerDataFrameT:
+        if is_numpy_array_2d(data):
+            return self._dataframe.from_numpy(data, schema=schema)
+        return self._series.from_numpy(data, context=self)

Just the fun stuff

class EagerNamespace(
    CompliantNamespace[EagerDataFrameT, EagerExprT],
    Protocol[EagerDataFrameT, EagerSeriesT_co, EagerExprT],
):
    @property
    def _series(self) -> type[EagerSeriesT_co]: ...
    @property
    def _dataframe(self) -> type[EagerDataFrameT]: ...
    def from_numpy( # <--------------- regular method for `Namespace` only
        self,
        data: Into1DArray | _2DArray,
        /,
        schema: Mapping[str, DType] | Schema | Sequence[str] | None = None,
    ) -> EagerSeriesT_co | EagerDataFrameT:
        if is_numpy_array_2d(data):
            return self._dataframe.from_numpy(data, schema=schema)
        return self._series.from_numpy(data, context=self)

I really think we can get some mileage out of this pattern ☺️

MarcoGorelli · 2025-03-15T18:23:23Z

cool thanks! merge when ready

Addresses #2196 (comment)

dangotbanned added 10 commits March 12, 2025 19:03

feat(typing): Add NumpyConvertible protocol

4b812d3

feat(typing): Extend CompliantSeries w/ NumpyConvertible

18b07f9

fix(typing): Add missing args for ArrowSeries.to_numpy

1401aa1

All other backends already had these

feat: add ArrowSeries.from_numpy

d02d83f

feat: add PandasLikeSeries.from_numpy

9c81781

feat: add PolarsSeries.from_numpy

b575e37

feat(DRAFT): add PolarsSeries.to_numpy

48c8179

fix: resolve circular import

f5405b4

refactor: Replace all _create_compliant_series

2a86bbb

refactor: Move map_batches up to EagerExpr

ca8fcdf

dangotbanned added the internal label Mar 12, 2025

dangotbanned added 4 commits March 12, 2025 21:21

coverage

d33f1ae

https://github.com/narwhals-dev/narwhals/actions/runs/13821678447/job/38668299244?pr=2196

refactor(typing): Use Into1DArray alias

9af9228

- Less repetition, but also helps document what the 2nd `TypeVar` is for (`from_`) - It has to be in that position to follow the rules of https://typing.python.org/en/latest/spec/generics.html#default-ordering-and-subscription-rules

Merge remote-tracking branch 'upstream/main' into series-from-numpy

cf0ae31

refactor: Reuse .__array__ for PolarsSeries.to_numpy

3cc3a1c

- We've already got the compat handled there - `polars` handles the rest in https://github.com/pola-rs/polars/blob/889a2a7a57be5da432b6fa854ab698bbaf1b02ff/py-polars/polars/series/series.py#L1357-L1399

dangotbanned commented Mar 13, 2025

View reviewed changes

narwhals/_pandas_like/series.py Outdated Show resolved Hide resolved

chore: force github to let me start a thread

1f67693

dangotbanned commented Mar 13, 2025

View reviewed changes

dangotbanned added the api design label Mar 13, 2025

dangotbanned marked this pull request as ready for review March 13, 2025 16:51

dangotbanned commented Mar 13, 2025

View reviewed changes

dangotbanned added 2 commits March 13, 2025 18:22

Merge branch 'main' into series-from-numpy

82be657

refactor: remove uncovered Implementation checl

333bc67

Resolves #2196 (comment)

dangotbanned added a commit that referenced this pull request Mar 13, 2025

refactor(typing): Simplify EagerNamespace

8319d0f

- Only needs to be the extra stuff - `_create_compliant_series` is removed in #2196

dangotbanned mentioned this pull request Mar 13, 2025

chore(typing): Fill out CompliantNamespace protocol #2202

Merged

10 tasks

dangotbanned added 3 commits March 14, 2025 10:34

Merge branch 'main' into series-from-numpy

6ebbad2

Merge branch 'main' into series-from-numpy

4176a67

Merge branch 'main' into series-from-numpy

adb6b7a

Merge remote-tracking branch 'upstream/main' into series-from-numpy

6a5ed1d

MarcoGorelli reviewed Mar 15, 2025

View reviewed changes

Merge branch 'main' into series-from-numpy

670345c

MarcoGorelli reviewed Mar 15, 2025

View reviewed changes

narwhals/dataframe.py Outdated Show resolved Hide resolved

MarcoGorelli approved these changes Mar 15, 2025

View reviewed changes

remove comment comment

c033c71

#2196 (comment)

dangotbanned merged commit c14f43b into main Mar 15, 2025
28 checks passed

dangotbanned deleted the series-from-numpy branch March 15, 2025 18:24

dangotbanned mentioned this pull request Mar 17, 2025

API: io functions for v2 #2116

Open

6 tasks

dangotbanned added a commit that referenced this pull request Mar 24, 2025

feat: Implement EagerNamespace.from_numpy

7f9fb6b

Addresses #2196 (comment)

dangotbanned mentioned this pull request Mar 24, 2025

refactor: *(Namespace|DataFrame).from_numpy #2283

Merged

10 tasks

	def extract_compliant(
	plx: CompliantNamespace[CompliantFrameT, CompliantSeriesOrNativeExprT_co],
	other: Any,
	*,
	str_as_lit: bool,
	) -> CompliantExpr[CompliantFrameT, CompliantSeriesOrNativeExprT_co] \| object:
	if is_expr(other):
	return other._to_compliant_expr(plx)
	if isinstance(other, str) and not str_as_lit:
	return plx.col(other)
	if is_narwhals_series(other):
	return other._compliant_series._to_expr()
	if is_numpy_array(other):
	return plx._create_compliant_series(other)._to_expr() # type: ignore[attr-defined]
	return other

		@classmethod
		def from_numpy(cls, data: Into1DArray, /, *, context: _FullContext) -> Self: ...

	def new_series(
	name: str,
	values: Any,
	dtype: DType \| type[DType] \| None = None,
	*,
	native_namespace: ModuleType,
	) -> Series[Any]:

	@classmethod
	def _from_iterable(
	cls: type[Self],
	data: Iterable[Any],
	name: str,
	*,
	context: _FullContext,
	) -> Self:
	return cls(
	chunked_array([data]),
	name=name,
	backend_version=context._backend_version,
	version=context._version,
	)

	def _from_scalar(self, value: Any) -> Self:
	if self._backend_version < (13,) and hasattr(value, "as_py"):
	value = value.as_py()
	return super()._from_scalar(value)

	@classmethod
	def from_numpy(cls, data: Into1DArray, /, *, context: _FullContext) -> Self:
	return cls._from_iterable(
	data if is_numpy_array_1d(data) else [data], name="", context=context
	)

	def _from_scalar(self, value: Any) -> Self:
	return self._from_iterable([value], name=self.name, context=self)

	@classmethod
	def _from_iterable(
	cls: type[Self], data: Iterable[Any], name: str, *, context: _FullContext
	) -> Self: ...

refactor: Add CompliantSeries.from_numpy #2196

refactor: Add CompliantSeries.from_numpy #2196

Uh oh!

Conversation

dangotbanned commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dangotbanned Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dangotbanned Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MarcoGorelli left a comment

Choose a reason for hiding this comment

Uh oh!

dangotbanned commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Just the fun stuff

Uh oh!

MarcoGorelli commented Mar 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

refactor: Add `CompliantSeries.from_numpy` #2196

refactor: Add `CompliantSeries.from_numpy` #2196

dangotbanned commented Mar 12, 2025 •

edited

Loading

dangotbanned Mar 13, 2025 •

edited

Loading

dangotbanned Mar 15, 2025 •

edited

Loading

dangotbanned commented Mar 15, 2025 •

edited

Loading