-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
TYP: improve type annotations and remove unnecessary type ignores #62315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add and refine type hints in core modules (algorithms, base, indexes, _version). - Remove redundant `# type: ignore[no-untyped-call]` comments in groupby/categorical.py. - Refactor variable annotations for clarity and consistency. - No functional changes; improves code quality and type safety.
pandas/_version.py
Outdated
|
||
|
||
def get_versions(): | ||
def get_versions() -> dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_versions() -> dict[str, Any]: | |
def get_versions() -> dict[str, str | None]: |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dict
is invariant, so it's generally speaking not a good idea to use unions as type arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t understand this point. Why would being more specific not be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good to be more specific, but you should keep in mind that dict[str, str]
is not assignable to dict[str, str | None]
due to its invariance. Depending on how this will be used, that could lead to unexpected typing errors down the line.
If you want to be specific here, then the best you can do is use a TypedDict
, or a union thereof. If this function is frequently used, then the effort might be worth it.
Another option would be to return a Mapping[str, str | None]
instead. The Mapping
value type parameter is covariant, so both dict[str, str]
and dict[str, str | None]
can be assigned to it (since dict <: Mapping
).
The "Any
trick" might also work, i.e. dict[str, str | Any]
. Personally I'm not a fan of this "Any
trick"` and don't think it should be promoted like that, but in this specific case it's not all that bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all for the input.
A few observations
-
the goal of this PR is to remove some
#type: ignore
from the pandas codebase. This annotation is on vendored code. So firstly, I wasn't particularly concerned about usingAny
here, but that's just my opinion and maybe because we effectively maintain this code we should treat it equally with the pandas code. Secondly, because it is vendored i would have probably have just created a.pyi
file. However, i noticed that others have added type hints to the file directly. adding a.pyi
would shadow these. However, i could have added the existing to the.pyi
as well so also an option. -
i've not contributed greatly to the type annotations of late so my understanding of policies may be outdated. But IIRC we always used to be precise in the return types, wrt to type and therefore would not have used
Mapping
for return types in the past. Happy to change if this is now acceptable and used elsewhere. If the reason for the original comment here is the reluctance to useAny
then I would also be happy using justdict
without any type arguments. Still better than untyped function. -
Strictness flags and other mypy command line options, or even grep, easily allows us to audit the use of
Any
. In the past i've not be adverse to useAny
. To me this is a lesser offence than a#type: ignore
. Hence adding aAny
to remove a#type: ignore
is IMO a win?
If you want to be specific here, then the best you can do is use a
TypedDict
, or a union thereof. If this function is frequently used, then the effort might be worth it.
I guess this is likely to be the most acceptable to all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I picked some nits; hope you don't mind
pandas/_version.py
Outdated
|
||
|
||
def get_versions(): | ||
def get_versions() -> dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dict
is invariant, so it's generally speaking not a good idea to use unions as type arguments
pandas/core/algorithms.py
Outdated
@overload | ||
def unique(values: np.ndarray) -> np.ndarray: ... | ||
@overload | ||
def unique(values: Index) -> Index: ... | ||
@overload | ||
def unique(values: Series) -> np.ndarray: ... | ||
@overload | ||
def unique(values: Categorical) -> Categorical: ... | ||
@overload | ||
def unique(values: ExtensionArray) -> ExtensionArray: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Series -> ndarray
and ndarray -> ndarray
overloads can be combined, e.g.:
@overload | |
def unique(values: np.ndarray) -> np.ndarray: ... | |
@overload | |
def unique(values: Index) -> Index: ... | |
@overload | |
def unique(values: Series) -> np.ndarray: ... | |
@overload | |
def unique(values: Categorical) -> Categorical: ... | |
@overload | |
def unique(values: ExtensionArray) -> ExtensionArray: ... | |
@overload | |
def unique(values: Index) -> Index: ... | |
@overload | |
def unique(values: Categorical) -> Categorical: ... | |
@overload | |
def unique(values: ExtensionArray) -> ExtensionArray: ... | |
@overload | |
def unique(values: np.ndarray | Series) -> np.ndarray: ... |
Additionally, you could merge the three other overloads by using a TypeVar(..., bound=Index | Categorical | ExtensionArray)
pandas/core/base.py
Outdated
if not isinstance(values, np.ndarray): | ||
# i.e. ExtensionArray | ||
result = values.unique() | ||
result: np.ndarray | ExtensionArray = values.unique() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you run mypy with --local-partial-types --allow-redefinition-new
this wouldn't be needed anymore.
https://mypy.readthedocs.io/en/latest/command_line.html#cmdoption-mypy-allow-redefinition-new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've started to look into this, and our global settings were set a long while ago now so a review is perhaps long overdue. Also bearing in mind that these flags will become the defaults in the future then also worth being ahead of the game. My initial impression is that i'm +1 on this. After a quick review it looks like in most cases people have being adding ignores which is IMO worse than adding the union variable annotation up front. But if mypy can now infer these without explicitly defining them then i assume that'll be a plus. 200-300 mypy errors at the moment. Adding the explicit type annotations for the variables defined outside the local scope will probably take a while but i've started on that and will do this as a precursor PR. Will probably also want to remove the variable annotations used like here where mypy can now infer the types. The only issue here maybe that mypy has flagged this as experimental and others may be against. Just to reiterate this looks like a good idea. Thanks for the suggestion.
pandas/core/indexes/base.py
Outdated
# This is faster, because Index.unique() checks for uniqueness | ||
# before calculating the unique values. | ||
res = algos.unique1d(res_indexer) | ||
res: Index | ExtensionArray | np.ndarray = algos.unique1d( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so for now have added ignore[assignment] instead, since this PR was specifically targeting ignore[no-untyped-call]. These should either be unnecessary with a precursor PR or can merge before if consider these the lesser of two evils.
not at all. Thanks for taking the time to review. |
…- precursor PR to change global config so these won't be necessary
Thanks @simonjayhawkins |
# type: ignore[no-untyped-call]
comments in groupby/categorical.py.