🏷️ `sparse`: improved init stubs of sparse arrays with shape as positional argument. #704

JulVandenBroeck · 2025-07-04T14:43:37Z

Hello!

I noticed some typing errors in my project when creating empty sparse arrays.
This happens if the main argument (arg1) is a shape-like object and the dtype kwarg is not np.float64, leading to pyright thinking that arg1 is an array, instead of a shape.
I fixed this by adding more overloads for the constructor of all sparse array classes. I also added tests for these changes in the existing test file for CSR arrays.

Thanks for the great work on the stubs so far.
Hopefully, this change will benefit the project!

Kind regards,
Jul

PS: This is my first pull request, so sorry for any mistakes!

jorenham

Thanks for this PR! It's a pretty difficult problem to solve, but for the most part you have actually managed to do so, which tells me that you're not new to this :)

I left some comments; hopefully they make sense to you.

Is there any particular reason for why you didn't cover dok_array, or was that an oversight?

jorenham · 2025-07-05T00:23:45Z

scipy-stubs/sparse/_bsr.pyi

        arg1: ToShape2D,
-        shape: None = None,
+        shape: ToShape2D | None = None,
        dtype: None = None,


This overload could be merged with the one at line 134:

Suggested change

dtype: None = None,

dtype: onp.AnyFloat64DType | None = None,

I believe that the same can be said for the other sparse formats.

So, I included a default overload with just the None option to cover the base case (where the dtype is np.float64). If I remove and merge it with the specific np.float64 overload, another overload (np.bool_ in this case) gets selected first when I do not specify a dtype.
Would you rather I put the np.float64 one at the top of the overloads, so that it gets selected first, or I keep the current overloads?

Ah, I think you are alluding to the first option in the suggestions below, so I'll stick to that one!

jorenham · 2025-07-05T00:24:03Z

scipy-stubs/sparse/_bsr.pyi

+    @overload  # 2-d shape-like, dtype: type[float]
+    def __init__(
+        self: bsr_array[np.float64],
+        /,
+        arg1: ToShape2D,
+        shape: ToShape2D | None = None,
+        dtype: onp.AnyFloat64DType | None = None,
+        copy: bool = False,
+        *,
+        maxprint: int | None = None,
+    ) -> None: ...


Suggested change

@overload # 2-d shape-like, dtype: type[float]

def __init__(

self: bsr_array[np.float64],

/,

arg1: ToShape2D,

shape: ToShape2D | None = None,

dtype: onp.AnyFloat64DType | None = None,

copy: bool = False,

*,

maxprint: int | None = None,

) -> None: ...

jorenham · 2025-07-05T00:47:52Z

scipy-stubs/sparse/_bsr.pyi

+    @overload  # 2-d shape-like, dtype: type[bool]
+    def __init__(
+        self: bsr_array[np.bool_],
+        /,
+        arg1: ToShape2D,
+        shape: ToShape2D | None = None,
+        dtype: onp.AnyBoolDType | None = None,
+        copy: bool = False,
+        *,
+        maxprint: int | None = None,
+    ) -> None: ...


Passing dtype=None won't result in a bsr_array[np.bool_], so the dtype shouldn't have a default here.

But because dtype can be either a positional- or a keyword-arg, and because the preceeding shape parameter has a default, we now have two separate situations we need to deal with:

bsr_array((2, 3), dtype=bool)

bsr_array((2, 3), None, bool)

The first form is probably what most users will go with in practice. But technically speaking, the second form is also valid, although I don't think it's very a likely choice. But to be honest, I have no numbers to back up that claim.

Personally I wouldn't mind if we only cover the first form here. But if you want to be complete about it and don't mind the additional complexity, then you could also add extra overloads for the second form.

This is how I'd annotate the first form:

Suggested change

@overload # 2-d shape-like, dtype: type[bool]

def __init__(

self: bsr_array[np.bool_],

/,

arg1: ToShape2D,

shape: ToShape2D | None = None,

dtype: onp.AnyBoolDType | None = None,

copy: bool = False,

*,

maxprint: int | None = None,

) -> None: ...

@overload # 2-d shape-like, dtype: bool-like (keyword)

def __init__(

self: bsr_array[np.bool_],

/,

arg1: ToShape2D,

shape: ToShape2D | None = None,

*,

dtype: onp.AnyBoolDType,

copy: bool = False,

maxprint: int | None = None,

) -> None: ...

and if we also cover the second form, that'd be

Suggested change

@overload # 2-d shape-like, dtype: type[bool]

def __init__(

self: bsr_array[np.bool_],

/,

arg1: ToShape2D,

shape: ToShape2D | None = None,

dtype: onp.AnyBoolDType | None = None,

copy: bool = False,

*,

maxprint: int | None = None,

) -> None: ...

@overload # 2-d shape-like, dtype: bool-like (keyword)

def __init__(

self: bsr_array[np.bool_],

/,

arg1: ToShape2D,

shape: ToShape2D | None = None,

*,

dtype: onp.AnyBoolDType,

copy: bool = False,

maxprint: int | None = None,

) -> None: ...

@overload # 2-d shape-like, dtype: bool-like (positional)

def __init__(

self: bsr_array[np.bool_],

/,

arg1: ToShape2D,

shape: ToShape2D | None,

dtype: onp.AnyBoolDType,

copy: bool = False,

maxprint: int | None = None,

) -> None: ...

I'm fine with either option, as long as you consistently apply it for all 3 dtypes, 7 sparse formats, and >=1 shape types (so >21 overloads).

I understand that the | None option shouldn't be there for the types other than np.float64, but in practice these overloads will never be picked, since now the np.float64 one is at the top and will be opted for first.
At the same time, the current implementation (with | None for each dtype) covers all cases.
So, I think it's at most a bit "hacky" to put | None after each dtype, but it does the job and avoids 21+ extra overloads.

What do you think? Could be I missed something :)

but in practice these overloads will never be picked,

that is, unless you pass a dtype that is actually annotated as e.g. type[bool] | None :)

def create_sparse_array(dtype: bool | None = None): return scipy.sparse.bsr_array((2, 2), dtype=dtype)

Aha, I see.. I'll look into the overloads to resolve this then.

tests/sparse/test_csr.pyi

JulVandenBroeck · 2025-07-05T07:42:28Z

Love the fast and thorough response! I'm gonna go over the suggestions ASAP.

I indeed missed the DOK format, I'll add the overloads there as well.

For the follow-up, do I just push the new changes to my fork?
In other words: do I have to "re-PR" the new commit, or does this thread get updated automatically when I update my branch?

Edit: seems I guessed correctly that it gets updated automatically! I indeed have some knowledge about Python typing and GitHub, but this PR stuff is new to me. Seems to be working great though 😆

JulVandenBroeck · 2025-07-05T08:25:22Z

As you can see in the latest commit, I included test cases for the different signatures (with keywords and positional arguments) and they all pass with the current layout.

jorenham · 2025-07-05T14:08:54Z

For the follow-up, do I just push the new changes to my fork?
In other words: do I have to "re-PR" the new commit, or does this thread get updated automatically when I update my branch?

don't worry about the commits, we can just squash-merge them :)

jorenham

apart from the remaining dtype: _ | None = None stuff, this looks good to me

… np.float64)

JulVandenBroeck · 2025-07-06T13:32:29Z

I implemented the 21+ overloads, did some tests, and believe it's correct now. :)

jorenham

Looks good!

One nitpick: the comments such as dtype: type[bool], only describe part of the story, because the onp.AnyBoolDType union type alias also includes types like Literal["?"].
So how about instead of type[bool], we refer to it as something intentionally vague, such as "bool-like"?

It shouldn't be much work if you use a (regex) search+replace or sed -e or something. But I also wouldn't mind if you don't feel like it, as this is already a great improvement right now :)

JulVandenBroeck · 2025-07-07T07:29:52Z

Sure! I've been using regex for the overloads, so might as well :D

jorenham · 2025-07-07T09:31:40Z

Thanks Jul!

Improved init stubs of sparse arrays with shape as positional argument.

2fe7a0d

JulVandenBroeck changed the title ~~Improved init stubs of sparse arrays with shape as positional argument.~~ 🏷️ improved init stubs of sparse arrays with shape as positional argument. Jul 4, 2025

jorenham added the scipy.sparse label Jul 5, 2025

jorenham self-requested a review July 5, 2025 00:15

jorenham added the is: fix label Jul 5, 2025

jorenham added this to the 1.16.0.3 milestone Jul 5, 2025

jorenham requested changes Jul 5, 2025

View reviewed changes

jorenham reviewed Jul 5, 2025

View reviewed changes

tests/sparse/test_csr.pyi Show resolved Hide resolved

jorenham changed the title ~~🏷️ improved init stubs of sparse arrays with shape as positional argument.~~ 🏷️ sparse: improved init stubs of sparse arrays with shape as positional argument. Jul 5, 2025

JulVandenBroeck added 2 commits July 5, 2025 09:59

Added DOK stubs + merged dtype=None and dtype=np.float64 cases

0ec1442

Added more test cases for CSR

b3d1ee0

JulVandenBroeck requested a review from jorenham July 5, 2025 08:29

jorenham reviewed Jul 6, 2025

View reviewed changes

JulVandenBroeck added 2 commits July 6, 2025 15:16

Fixed constructor stubs for positional and keyword dtypes (other than…

f2d57e8

… np.float64)

Added missing keyword separators

c34349a

JulVandenBroeck requested a review from jorenham July 6, 2025 13:32

jorenham approved these changes Jul 6, 2025

View reviewed changes

Cleaned up comments for overloads of sparse array constructors

7796ded

JulVandenBroeck requested a review from jorenham July 7, 2025 08:35

jorenham merged commit ad7f2bd into scipy:master Jul 7, 2025
16 checks passed

JulVandenBroeck deleted the improved-sparse-array-init-stubs branch July 7, 2025 11:53

	dtype: None = None,
	dtype: onp.AnyFloat64DType \| None = None,

Uh oh!

🏷️ sparse: improved init stubs of sparse arrays with shape as positional argument. #704

🏷️ sparse: improved init stubs of sparse arrays with shape as positional argument. #704

Uh oh!

Conversation

JulVandenBroeck commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorenham left a comment

Choose a reason for hiding this comment

Uh oh!

jorenham Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

JulVandenBroeck Jul 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JulVandenBroeck Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

jorenham Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

jorenham Jul 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JulVandenBroeck Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

jorenham Jul 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JulVandenBroeck Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JulVandenBroeck commented Jul 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JulVandenBroeck commented Jul 5, 2025

Uh oh!

jorenham commented Jul 5, 2025

Uh oh!

jorenham left a comment

Choose a reason for hiding this comment

Uh oh!

JulVandenBroeck commented Jul 6, 2025

Uh oh!

jorenham left a comment

Choose a reason for hiding this comment

Uh oh!

JulVandenBroeck commented Jul 7, 2025

Uh oh!

Uh oh!

jorenham commented Jul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🏷️ `sparse`: improved init stubs of sparse arrays with shape as positional argument. #704

🏷️ `sparse`: improved init stubs of sparse arrays with shape as positional argument. #704

JulVandenBroeck commented Jul 4, 2025 •

edited

Loading

JulVandenBroeck Jul 5, 2025 •

edited

Loading

jorenham Jul 5, 2025 •

edited

Loading

jorenham Jul 5, 2025 •

edited

Loading

JulVandenBroeck commented Jul 5, 2025 •

edited

Loading