Skip to content

Conversation

cmp0xff
Copy link
Contributor

@cmp0xff cmp0xff commented Oct 7, 2025

ruff rules usually improve the code quality. By specifying ignore rather than select, we get better hints how to comply to more rules.

In many cases this reduces errors in pyright strict, hence contributing to #1171.

For example, if we comply to ANN201, test functions will be foreced to have -> None, which reduces pyright strict errors. This was discussed in #1405 (comment).

@cmp0xff
Copy link
Contributor Author

cmp0xff commented Oct 7, 2025

Maybe we can freeze versions of all type checkers and upgrade manually to prevent this

# If the operator is not defined for the underlying objects,
# a TypeError should be raised
res = [op(a, b) for (a, b) in zip(lvalues, rvalues)]
res = [op(a, b) for (a, b) in zip(lvalues, rvalues, strict=False)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is annoying. strict=False is the default, so we shouldn't have to specify it. Can you ignore that rule with ruff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check(
assert_type(pd.MultiIndex.from_tuples(zip([1, 2], [3, 4])), pd.MultiIndex),
assert_type(
pd.MultiIndex.from_tuples(zip([1, 2], [3, 4], strict=False)), pd.MultiIndex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about zip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id = [1, 2, 3]
value = ["a", "b", "c"]
df = pd.DataFrame(zip(id, value), columns=["id", "value"])
df = pd.DataFrame(zip(id, value, strict=False), columns=["id", "value"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about zip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_type(
pd.concat(map(lambda _: s2, ["some_value", 3]), axis=1), pd.DataFrame
),
assert_type(pd.concat((s2 for _ in ["some_value", 3]), axis=1), pd.DataFrame),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure of this change. I think we explicitly wanted to use map and lambda here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"B1970": {0: 2.5, 1: 1.2, 2: 0.7},
"B1980": {0: 3.2, 1: 1.3, 2: 0.1},
"X": dict(zip(range(3), np.random.randn(3))),
"X": dict(zip(range(3), np.random.randn(3), strict=False)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same zip comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyproject.toml Outdated
"PD", # Pandas is here
"PLC0415", # https://docs.astral.sh/ruff/rules/import-outside-top-level/
"S101", # https://docs.astral.sh/ruff/rules/assert/
"S301", # https://docs.astral.sh/ruff/rules/suspicious-pickle-usage/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can limit this to one file tests/test_io.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pyproject.toml Outdated
"PLC0415", # https://docs.astral.sh/ruff/rules/import-outside-top-level/
"S101", # https://docs.astral.sh/ruff/rules/assert/
"S301", # https://docs.astral.sh/ruff/rules/suspicious-pickle-usage/
"SLF001", # https://docs.astral.sh/ruff/rules/private-member-access/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should aim to fix this one so put it in the second list below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved below

pyproject.toml Outdated
"S301", # https://docs.astral.sh/ruff/rules/suspicious-pickle-usage/
"SLF001", # https://docs.astral.sh/ruff/rules/private-member-access/
# The following rules are ignored temporarily. Either fix them or move to the permanent list above.
"ANN", "ARG", "ERA", "RUF", "SIM", "TRY", "PT", "NPY", "N", "DTZ", "PLR", "TC", "PGH", "PTH", "S311", "C901", "FIX", "TD", "A001", "PYI042", "ANN201"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • FIX and TD relates to TODO, so we can do a per-file ignore
  • Surprised that ANN201 isn't already taken care of because that's the one that requires return types on functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • FIX and TD now more detailed
  • ANN201 will be fixed in a separate PR

and ... not in tuple_args # fixed-length tuple
and (arr_ndim := getattr(actual, "ndim"))
!= (expected_ndim := len(tuple_args))
and (arr_ndim := actual.ndim) != (expected_ndim := len(tuple_args)) # type: ignore[attr-defined] # pyright: ignore[reportAttributeAccessIssue]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is annoying, because getattr() is a good usage here. If there is a RUFF rule that is causing this fix, let's ignore it on a per-file basis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getattr is no better than ., it also raises an error when the attribute is not there. Nevertheless, f251900

and isinstance((expected_dtype := dtype_args[0]), type)
and issubclass(expected_dtype, np.generic)
and (arr_dtype := getattr(actual, "dtype")) != expected_dtype
and (arr_dtype := actual.dtype) != expected_dtype # type: ignore[attr-defined] # pyright: ignore[reportAttributeAccessIssue]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about doing a per-file ignore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getattr is no better than ., it also raises an error when the attribute is not there. Nevertheless, f251900

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 7, 2025

Maybe we can freeze versions of all type checkers and upgrade manually to prevent this

We can do it on a per-type-checker basis, and if we see that a new version has these kinds of failures, we can pin it. And report it to them so they are aware.

We have pinned mypy because when they released changes, it tended to be disruptive, and we definitely want to fix those issues. But they are better now, we probably don't need to pin mypy any more. They test mypy with pandas-stubs.
We don't pin pyright, because they they release changes, there are typically only a few things we might have to deal with, or they have introduced a bug. They also test pyright with pandas-stubs.
pyrefly and ty are supposed to be tested with pandas-stubs (just the stub file), so if a new version of either of those report a bunch of errors like the above, we should report it to them, and then pin the version until they fix it.

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do those pyrefly ignores, because they are in disagreement with both mypy and pyright.

I also disagree with them reporting the bad-param-name-override on methods like __getitem__() and __iter__().

We should report this to them, and if we need to, add something in our config to ignore the rules you inserted in the commit #6847e8657878201653b49f401c18b72e680b9eb9 and then put the ignore (for now) in pyproject.toml

Copy link
Contributor Author

@cmp0xff cmp0xff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For pyrefly, I added a TODO in ace7a24

# If the operator is not defined for the underlying objects,
# a TypeError should be raised
res = [op(a, b) for (a, b) in zip(lvalues, rvalues)]
res = [op(a, b) for (a, b) in zip(lvalues, rvalues, strict=False)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check(
assert_type(pd.MultiIndex.from_tuples(zip([1, 2], [3, 4])), pd.MultiIndex),
assert_type(
pd.MultiIndex.from_tuples(zip([1, 2], [3, 4], strict=False)), pd.MultiIndex
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id = [1, 2, 3]
value = ["a", "b", "c"]
df = pd.DataFrame(zip(id, value), columns=["id", "value"])
df = pd.DataFrame(zip(id, value, strict=False), columns=["id", "value"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and ... not in tuple_args # fixed-length tuple
and (arr_ndim := getattr(actual, "ndim"))
!= (expected_ndim := len(tuple_args))
and (arr_ndim := actual.ndim) != (expected_ndim := len(tuple_args)) # type: ignore[attr-defined] # pyright: ignore[reportAttributeAccessIssue]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getattr is no better than ., it also raises an error when the attribute is not there. Nevertheless, f251900

and isinstance((expected_dtype := dtype_args[0]), type)
and issubclass(expected_dtype, np.generic)
and (arr_dtype := getattr(actual, "dtype")) != expected_dtype
and (arr_dtype := actual.dtype) != expected_dtype # type: ignore[attr-defined] # pyright: ignore[reportAttributeAccessIssue]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getattr is no better than ., it also raises an error when the attribute is not there. Nevertheless, f251900

pyproject.toml Outdated
"PYI042", # https://docs.astral.sh/ruff/rules/snake-case-type-alias/
"TD002", # https://docs.astral.sh/ruff/rules/missing-todo-author/
# The following rules are ignored temporarily. Either fix them or move to the permanent list above.
"ERA", "S", "ANN", "FIX002", "TD003", "TD004", "PLR0402", "PLC0105"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • S and most of ERA removed
  • I don't think we'll comply to FIX002. It is fine for me to leave TODOs.
  • There are a few TODOs without link. We can fix them by adding issues.
  • TD004 fixed

pyproject.toml Outdated
"PD", # Pandas is here
"PLC0415", # https://docs.astral.sh/ruff/rules/import-outside-top-level/
"S101", # https://docs.astral.sh/ruff/rules/assert/
"S301", # https://docs.astral.sh/ruff/rules/suspicious-pickle-usage/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pyproject.toml Outdated
"PLC0415", # https://docs.astral.sh/ruff/rules/import-outside-top-level/
"S101", # https://docs.astral.sh/ruff/rules/assert/
"S301", # https://docs.astral.sh/ruff/rules/suspicious-pickle-usage/
"SLF001", # https://docs.astral.sh/ruff/rules/private-member-access/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved below

pyproject.toml Outdated
"S301", # https://docs.astral.sh/ruff/rules/suspicious-pickle-usage/
"SLF001", # https://docs.astral.sh/ruff/rules/private-member-access/
# The following rules are ignored temporarily. Either fix them or move to the permanent list above.
"ANN", "ARG", "ERA", "RUF", "SIM", "TRY", "PT", "NPY", "N", "DTZ", "PLR", "TC", "PGH", "PTH", "S311", "C901", "FIX", "TD", "A001", "PYI042", "ANN201"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • FIX and TD now more detailed
  • ANN201 will be fixed in a separate PR

@cmp0xff
Copy link
Contributor Author

cmp0xff commented Oct 7, 2025

/pandas_nightly

@cmp0xff
Copy link
Contributor Author

cmp0xff commented Oct 7, 2025

/mypy_nightly

@cmp0xff cmp0xff requested a review from Dr-Irv October 7, 2025 22:23
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cmp0xff

@Dr-Irv Dr-Irv merged commit 1cdecd9 into pandas-dev:main Oct 8, 2025
28 checks passed
@cmp0xff cmp0xff deleted the feature/cmp0xff/ruff-negative-list branch October 8, 2025 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants