Skip to content

Conversation

simonjayhawkins
Copy link
Member

  • Add and refine type hints in core modules (algorithms, base, indexes, _version).
  • Remove redundant # type: ignore[no-untyped-call] comments in groupby/categorical.py.
  • Refactor variable annotations for clarity and consistency.
  • No functional changes; improves code quality and type safety.

- Add and refine type hints in core modules (algorithms, base, indexes, _version).
- Remove redundant `# type: ignore[no-untyped-call]` comments in groupby/categorical.py.
- Refactor variable annotations for clarity and consistency.
- No functional changes; improves code quality  and type safety.
@simonjayhawkins simonjayhawkins added the Typing type annotations, mypy/pyright type checking label Sep 11, 2025


def get_versions():
def get_versions() -> dict[str, Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_versions() -> dict[str, Any]:
def get_versions() -> dict[str, str | None]:

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict is invariant, so it's generally speaking not a good idea to use unions as type arguments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t understand this point. Why would being more specific not be better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to be more specific, but you should keep in mind that dict[str, str] is not assignable to dict[str, str | None] due to its invariance. Depending on how this will be used, that could lead to unexpected typing errors down the line.

If you want to be specific here, then the best you can do is use a TypedDict, or a union thereof. If this function is frequently used, then the effort might be worth it.

Another option would be to return a Mapping[str, str | None] instead. The Mapping value type parameter is covariant, so both dict[str, str] and dict[str, str | None] can be assigned to it (since dict <: Mapping).

The "Any trick" might also work, i.e. dict[str, str | Any]. Personally I'm not a fan of this "Any trick"` and don't think it should be promoted like that, but in this specific case it's not all that bad.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all for the input.

A few observations

  1. the goal of this PR is to remove some #type: ignore from the pandas codebase. This annotation is on vendored code. So firstly, I wasn't particularly concerned about using Any here, but that's just my opinion and maybe because we effectively maintain this code we should treat it equally with the pandas code. Secondly, because it is vendored i would have probably have just created a .pyi file. However, i noticed that others have added type hints to the file directly. adding a .pyi would shadow these. However, i could have added the existing to the .pyi as well so also an option.

  2. i've not contributed greatly to the type annotations of late so my understanding of policies may be outdated. But IIRC we always used to be precise in the return types, wrt to type and therefore would not have used Mapping for return types in the past. Happy to change if this is now acceptable and used elsewhere. If the reason for the original comment here is the reluctance to use Any then I would also be happy using just dict without any type arguments. Still better than untyped function.

  3. Strictness flags and other mypy command line options, or even grep, easily allows us to audit the use of Any. In the past i've not be adverse to use Any. To me this is a lesser offence than a #type: ignore. Hence adding a Any to remove a #type: ignore is IMO a win?

If you want to be specific here, then the best you can do is use a TypedDict, or a union thereof. If this function is frequently used, then the effort might be worth it.

I guess this is likely to be the most acceptable to all?

Copy link
Contributor

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picked some nits; hope you don't mind



def get_versions():
def get_versions() -> dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict is invariant, so it's generally speaking not a good idea to use unions as type arguments

Comment on lines 318 to 327
@overload
def unique(values: np.ndarray) -> np.ndarray: ...
@overload
def unique(values: Index) -> Index: ...
@overload
def unique(values: Series) -> np.ndarray: ...
@overload
def unique(values: Categorical) -> Categorical: ...
@overload
def unique(values: ExtensionArray) -> ExtensionArray: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Series -> ndarray and ndarray -> ndarray overloads can be combined, e.g.:

Suggested change
@overload
def unique(values: np.ndarray) -> np.ndarray: ...
@overload
def unique(values: Index) -> Index: ...
@overload
def unique(values: Series) -> np.ndarray: ...
@overload
def unique(values: Categorical) -> Categorical: ...
@overload
def unique(values: ExtensionArray) -> ExtensionArray: ...
@overload
def unique(values: Index) -> Index: ...
@overload
def unique(values: Categorical) -> Categorical: ...
@overload
def unique(values: ExtensionArray) -> ExtensionArray: ...
@overload
def unique(values: np.ndarray | Series) -> np.ndarray: ...

Additionally, you could merge the three other overloads by using a TypeVar(..., bound=Index | Categorical | ExtensionArray)

if not isinstance(values, np.ndarray):
# i.e. ExtensionArray
result = values.unique()
result: np.ndarray | ExtensionArray = values.unique()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you run mypy with --local-partial-types --allow-redefinition-new this wouldn't be needed anymore.

https://mypy.readthedocs.io/en/latest/command_line.html#cmdoption-mypy-allow-redefinition-new

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started to look into this, and our global settings were set a long while ago now so a review is perhaps long overdue. Also bearing in mind that these flags will become the defaults in the future then also worth being ahead of the game. My initial impression is that i'm +1 on this. After a quick review it looks like in most cases people have being adding ignores which is IMO worse than adding the union variable annotation up front. But if mypy can now infer these without explicitly defining them then i assume that'll be a plus. 200-300 mypy errors at the moment. Adding the explicit type annotations for the variables defined outside the local scope will probably take a while but i've started on that and will do this as a precursor PR. Will probably also want to remove the variable annotations used like here where mypy can now infer the types. The only issue here maybe that mypy has flagged this as experimental and others may be against. Just to reiterate this looks like a good idea. Thanks for the suggestion.

# This is faster, because Index.unique() checks for uniqueness
# before calculating the unique values.
res = algos.unique1d(res_indexer)
res: Index | ExtensionArray | np.ndarray = algos.unique1d(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so for now have added ignore[assignment] instead, since this PR was specifically targeting ignore[no-untyped-call]. These should either be unnecessary with a precursor PR or can merge before if consider these the lesser of two evils.

@simonjayhawkins
Copy link
Member Author

I picked some nits; hope you don't mind

not at all. Thanks for taking the time to review.

@mroeschke mroeschke added this to the 3.0 milestone Sep 12, 2025
@mroeschke mroeschke merged commit c99e52e into pandas-dev:main Sep 12, 2025
41 checks passed
@mroeschke
Copy link
Member

Thanks @simonjayhawkins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Typing type annotations, mypy/pyright type checking

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants