Skip to content

Conversation

@JulVandenBroeck
Copy link
Contributor

@JulVandenBroeck JulVandenBroeck commented Jul 4, 2025

Hello!

I noticed some typing errors in my project when creating empty sparse arrays.
This happens if the main argument (arg1) is a shape-like object and the dtype kwarg is not np.float64, leading to pyright thinking that arg1 is an array, instead of a shape.
I fixed this by adding more overloads for the constructor of all sparse array classes. I also added tests for these changes in the existing test file for CSR arrays.

Thanks for the great work on the stubs so far.
Hopefully, this change will benefit the project!

Kind regards,
Jul

PS: This is my first pull request, so sorry for any mistakes!

@JulVandenBroeck JulVandenBroeck changed the title Improved init stubs of sparse arrays with shape as positional argument. 🏷️ improved init stubs of sparse arrays with shape as positional argument. Jul 4, 2025
@jorenham jorenham self-requested a review July 5, 2025 00:15
@jorenham jorenham added this to the 1.16.0.3 milestone Jul 5, 2025
Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR! It's a pretty difficult problem to solve, but for the most part you have actually managed to do so, which tells me that you're not new to this :)

I left some comments; hopefully they make sense to you.

Is there any particular reason for why you didn't cover dok_array, or was that an oversight?

arg1: ToShape2D,
shape: None = None,
shape: ToShape2D | None = None,
dtype: None = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overload could be merged with the one at line 134:

Suggested change
dtype: None = None,
dtype: onp.AnyFloat64DType | None = None,

I believe that the same can be said for the other sparse formats.

Copy link
Contributor Author

@JulVandenBroeck JulVandenBroeck Jul 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I included a default overload with just the None option to cover the base case (where the dtype is np.float64). If I remove and merge it with the specific np.float64 overload, another overload (np.bool_ in this case) gets selected first when I do not specify a dtype.
Would you rather I put the np.float64 one at the top of the overloads, so that it gets selected first, or I keep the current overloads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think you are alluding to the first option in the suggestions below, so I'll stick to that one!

Comment on lines 134 to 144
@overload # 2-d shape-like, dtype: type[float]
def __init__(
self: bsr_array[np.float64],
/,
arg1: ToShape2D,
shape: ToShape2D | None = None,
dtype: onp.AnyFloat64DType | None = None,
copy: bool = False,
*,
maxprint: int | None = None,
) -> None: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@overload # 2-d shape-like, dtype: type[float]
def __init__(
self: bsr_array[np.float64],
/,
arg1: ToShape2D,
shape: ToShape2D | None = None,
dtype: onp.AnyFloat64DType | None = None,
copy: bool = False,
*,
maxprint: int | None = None,
) -> None: ...

Comment on lines 112 to 122
@overload # 2-d shape-like, dtype: type[bool]
def __init__(
self: bsr_array[np.bool_],
/,
arg1: ToShape2D,
shape: ToShape2D | None = None,
dtype: onp.AnyBoolDType | None = None,
copy: bool = False,
*,
maxprint: int | None = None,
) -> None: ...
Copy link
Member

@jorenham jorenham Jul 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing dtype=None won't result in a bsr_array[np.bool_], so the dtype shouldn't have a default here.

But because dtype can be either a positional- or a keyword-arg, and because the preceeding shape parameter has a default, we now have two separate situations we need to deal with:

  1. bsr_array((2, 3), dtype=bool)
  2. bsr_array((2, 3), None, bool)

The first form is probably what most users will go with in practice. But technically speaking, the second form is also valid, although I don't think it's very a likely choice. But to be honest, I have no numbers to back up that claim.

Personally I wouldn't mind if we only cover the first form here. But if you want to be complete about it and don't mind the additional complexity, then you could also add extra overloads for the second form.

This is how I'd annotate the first form:

Suggested change
@overload # 2-d shape-like, dtype: type[bool]
def __init__(
self: bsr_array[np.bool_],
/,
arg1: ToShape2D,
shape: ToShape2D | None = None,
dtype: onp.AnyBoolDType | None = None,
copy: bool = False,
*,
maxprint: int | None = None,
) -> None: ...
@overload # 2-d shape-like, dtype: bool-like (keyword)
def __init__(
self: bsr_array[np.bool_],
/,
arg1: ToShape2D,
shape: ToShape2D | None = None,
*,
dtype: onp.AnyBoolDType,
copy: bool = False,
maxprint: int | None = None,
) -> None: ...

and if we also cover the second form, that'd be

Suggested change
@overload # 2-d shape-like, dtype: type[bool]
def __init__(
self: bsr_array[np.bool_],
/,
arg1: ToShape2D,
shape: ToShape2D | None = None,
dtype: onp.AnyBoolDType | None = None,
copy: bool = False,
*,
maxprint: int | None = None,
) -> None: ...
@overload # 2-d shape-like, dtype: bool-like (keyword)
def __init__(
self: bsr_array[np.bool_],
/,
arg1: ToShape2D,
shape: ToShape2D | None = None,
*,
dtype: onp.AnyBoolDType,
copy: bool = False,
maxprint: int | None = None,
) -> None: ...
@overload # 2-d shape-like, dtype: bool-like (positional)
def __init__(
self: bsr_array[np.bool_],
/,
arg1: ToShape2D,
shape: ToShape2D | None,
dtype: onp.AnyBoolDType,
copy: bool = False,
maxprint: int | None = None,
) -> None: ...

I'm fine with either option, as long as you consistently apply it for all 3 dtypes, 7 sparse formats, and >=1 shape types (so >21 overloads).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that the | None option shouldn't be there for the types other than np.float64, but in practice these overloads will never be picked, since now the np.float64 one is at the top and will be opted for first.
At the same time, the current implementation (with | None for each dtype) covers all cases.
So, I think it's at most a bit "hacky" to put | None after each dtype, but it does the job and avoids 21+ extra overloads.

What do you think? Could be I missed something :)

Copy link
Member

@jorenham jorenham Jul 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in practice these overloads will never be picked,

that is, unless you pass a dtype that is actually annotated as e.g. type[bool] | None :)

def create_sparse_array(dtype: bool | None = None):
    return scipy.sparse.bsr_array((2, 2), dtype=dtype)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I see.. I'll look into the overloads to resolve this then.

@jorenham jorenham changed the title 🏷️ improved init stubs of sparse arrays with shape as positional argument. 🏷️ sparse: improved init stubs of sparse arrays with shape as positional argument. Jul 5, 2025
@JulVandenBroeck
Copy link
Contributor Author

JulVandenBroeck commented Jul 5, 2025

Love the fast and thorough response! I'm gonna go over the suggestions ASAP.

I indeed missed the DOK format, I'll add the overloads there as well.

For the follow-up, do I just push the new changes to my fork?
In other words: do I have to "re-PR" the new commit, or does this thread get updated automatically when I update my branch?

Edit: seems I guessed correctly that it gets updated automatically! I indeed have some knowledge about Python typing and GitHub, but this PR stuff is new to me. Seems to be working great though 😆

@JulVandenBroeck
Copy link
Contributor Author

As you can see in the latest commit, I included test cases for the different signatures (with keywords and positional arguments) and they all pass with the current layout.

@JulVandenBroeck JulVandenBroeck requested a review from jorenham July 5, 2025 08:29
@jorenham
Copy link
Member

jorenham commented Jul 5, 2025

For the follow-up, do I just push the new changes to my fork?
In other words: do I have to "re-PR" the new commit, or does this thread get updated automatically when I update my branch?

don't worry about the commits, we can just squash-merge them :)

Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apart from the remaining dtype: _ | None = None stuff, this looks good to me

@JulVandenBroeck
Copy link
Contributor Author

I implemented the 21+ overloads, did some tests, and believe it's correct now. :)

@JulVandenBroeck JulVandenBroeck requested a review from jorenham July 6, 2025 13:32
Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

One nitpick: the comments such as dtype: type[bool], only describe part of the story, because the onp.AnyBoolDType union type alias also includes types like Literal["?"].
So how about instead of type[bool], we refer to it as something intentionally vague, such as "bool-like"?

It shouldn't be much work if you use a (regex) search+replace or sed -e or something. But I also wouldn't mind if you don't feel like it, as this is already a great improvement right now :)

@JulVandenBroeck
Copy link
Contributor Author

Sure! I've been using regex for the overloads, so might as well :D

@JulVandenBroeck JulVandenBroeck requested a review from jorenham July 7, 2025 08:35
@jorenham jorenham merged commit ad7f2bd into scipy:master Jul 7, 2025
16 checks passed
@jorenham
Copy link
Member

jorenham commented Jul 7, 2025

Thanks Jul!

@JulVandenBroeck JulVandenBroeck deleted the improved-sparse-array-init-stubs branch July 7, 2025 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants