-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
DEPR: Deprecate non-ISO date string formats in DatetimeIndex.loc #62991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
DEPR: Deprecate non-ISO date string formats in DatetimeIndex.loc #62991
Conversation
415367c to
109e1c2
Compare
|
Hi @WillAyd @MarcoGorelli @mroeschke The PR is ready for review whenever you have time. Thanks |
| return result | ||
|
|
||
|
|
||
| def _is_iso_format_string(date_str: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Credit to @stefmolin for the original idea, but I don't think we should try and roll our own regex here if we can avoid it.
I see the standard library provides date.fromisostring, although that is documented to not work with "Reduced Precision" dates:
https://docs.python.org/3/library/datetime.html
Even still I wonder if we can't use that first and only fallback when it fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @stefmolin @WillAyd!. Switched to using date.fromisoformat() like you suggested. Added a regex fallback to handle the reduced precision dates (YYYY and YYYY-MM) that fromisoformat doesn't support .
| @pytest.mark.parametrize("bins", [None, [0, 5]], ids=repr) | ||
| @pytest.mark.parametrize("isort", [True, False]) | ||
| @pytest.mark.parametrize("normalize, name", [(True, "proportion"), (False, "count")]) | ||
| @pytest.mark.filterwarnings( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this test need to filter warnings? It seems unrelated to the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we group by keys='2nd' (the date column - from parameterized tests), it triggers the deprecation warning internally during the groupby operation. Without it, those test cases fail in CI. So, added the filterwarnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm so is that a problem with the groupby internals or the test data? This feels like something we shouldn't have to filter on this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right. The issue is that it's warning when groupby internally calls get_loc() on column names, not just user-facing .loc calls. I'll add a parameter to only warn on user-facing indexing, not internal operations like groupby. That way we don't need to filter warnings.
| with pytest.raises(TypeError, match=msg): | ||
| dti.get_loc(key) | ||
|
|
||
| @pytest.mark.filterwarnings( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this test still exhibit the expected behavior if you change the test to use ISO strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works as expected. Updated the test to use ISO format strings instead and removed the filterwarnings.
| tm.assert_series_equal(ts[[Period("2012-01-02", freq="D")]], exp) | ||
|
|
||
| @pytest.mark.arm_slow | ||
| @pytest.mark.filterwarnings( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can change the test data here to use ISO strings, unless it explicitly tests non-ISO for a reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting all the dates to ISO format in this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd Fixed all your comments. Ready for review whenever you get a chance.
b82bf18 to
936847d
Compare
pandas/core/indexes/datetimes.py
Outdated
| # - Followed by: hyphen (YYYY-), T (YYYY-T...), or end (YYYY) | ||
| # Examples that match: "2024", "2024-01", "2024-01-10", "2024-01-10T00:00:00" | ||
| # Examples that don't: "01/10/2024", "2024 01 10", "1/1/2024" | ||
| return re.match(r"^\d{4}(?:-|T|$)", date_str) is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regular expression seems like it would catch way more than what is expected. Wouldn't this match something like 2025-ANYTHING-GOES?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I checked and the loose regex actually works fine. Those invalid strings fail at the parsing step before reaching the regex check. But you have a point, so I'll tighten the regex to be more defensive. Also adding a test case for "2025-ANYTHING-GOES" to make sure it's handled right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea we definitely want this to be strict. Ultimately its not relevant whether another part of the code base "catches" the issue, because refactors happen and invariants change. If the function is going to be called is_iso_format_string then it should only return True when the string is actually ISO
| @pytest.mark.parametrize("bins", [None, [0, 5]], ids=repr) | ||
| @pytest.mark.parametrize("isort", [True, False]) | ||
| @pytest.mark.parametrize("normalize, name", [(True, "proportion"), (False, "count")]) | ||
| @pytest.mark.filterwarnings( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm so is that a problem with the groupby internals or the test data? This feels like something we shouldn't have to filter on this test
| tm.assert_frame_equal(result, expected_output) | ||
|
|
||
|
|
||
| @pytest.mark.filterwarnings( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another one that I don't think should have warnings
|
|
||
|
|
||
| class TestSlicing: | ||
| pytestmark = pytest.mark.filterwarnings( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this apply to the entire class? Can we apply to the failing tests instead?
…d add test suppressions
936847d to
ab2d8bc
Compare
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.Before:
After:
There's no way to know if "1/10/2024" is being parsed as MM/DD or DD/MM until you run it. This deprecation pushes users toward ISO format to avoid the confusion.