-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
DEPR: Deprecate non-ISO date string formats in DatetimeIndex.loc #62991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
e090f54
df128fc
0153913
6667cce
6838852
1b48d87
13555ca
edc8261
81f0cf5
306b2b1
e36c2e1
a88db1b
adae1b2
fd49ac4
ab2d8bc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -72,6 +72,9 @@ def seed_df(seed_nans, n, m): | |
| @pytest.mark.parametrize("bins", [None, [0, 5]], ids=repr) | ||
| @pytest.mark.parametrize("isort", [True, False]) | ||
| @pytest.mark.parametrize("normalize, name", [(True, "proportion"), (False, "count")]) | ||
| @pytest.mark.filterwarnings( | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why does this test need to filter warnings? It seems unrelated to the change?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When we group by keys='2nd' (the date column - from parameterized tests), it triggers the deprecation warning internally during the groupby operation. Without it, those test cases fail in CI. So, added the filterwarnings.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm so is that a problem with the groupby internals or the test data? This feels like something we shouldn't have to filter on this test
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, you're right. The issue is that it's warning when groupby internally calls get_loc() on column names, not just user-facing .loc calls. I'll add a parameter to only warn on user-facing indexing, not internal operations like groupby. That way we don't need to filter warnings. |
||
| "ignore:Parsing non-ISO datetime strings:pandas.errors.Pandas4Warning" | ||
| ) | ||
| def test_series_groupby_value_counts( | ||
| seed_nans, | ||
| num_rows, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2851,6 +2851,9 @@ def test_groupby_with_Time_Grouper(unit): | |
| tm.assert_frame_equal(result, expected_output) | ||
|
|
||
|
|
||
| @pytest.mark.filterwarnings( | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is another one that I don't think should have warnings |
||
| "ignore:Parsing non-ISO datetime strings:pandas.errors.Pandas4Warning" | ||
| ) | ||
| def test_groupby_series_with_datetimeindex_month_name(): | ||
| # GH 48509 | ||
| s = Series([0, 1, 0], index=date_range("2022-01-01", periods=3), name="jan") | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,6 +5,8 @@ | |
| import numpy as np | ||
| import pytest | ||
|
|
||
| from pandas.errors import Pandas4Warning | ||
|
|
||
| from pandas import ( | ||
| DataFrame, | ||
| DatetimeIndex, | ||
|
|
@@ -19,6 +21,10 @@ | |
|
|
||
|
|
||
| class TestSlicing: | ||
| pytestmark = pytest.mark.filterwarnings( | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this apply to the entire class? Can we apply to the failing tests instead? |
||
| "ignore:Parsing non-ISO datetime strings:pandas.errors.Pandas4Warning" | ||
| ) | ||
|
|
||
| def test_string_index_series_name_converted(self): | ||
| # GH#1644 | ||
| df = DataFrame( | ||
|
|
@@ -464,3 +470,99 @@ def test_slice_reduce_to_series(self): | |
| ) | ||
| result = df.loc["2000", "A"] | ||
| tm.assert_series_equal(result, expected) | ||
|
|
||
|
|
||
| class TestDatetimeIndexNonISODeprecation: | ||
| """Tests for deprecation of non-ISO string formats in .loc indexing. GH#58302""" | ||
|
|
||
| @pytest.fixture | ||
| def ser_daily(self): | ||
| """Create a Series with daily DatetimeIndex for testing.""" | ||
| return Series( | ||
| range(15), | ||
| index=DatetimeIndex(date_range(start="2024-01-01", freq="D", periods=15)), | ||
| ) | ||
|
|
||
| @pytest.mark.parametrize( | ||
| "date_string", | ||
| [ | ||
| "1/10/2024", # MM/DD/YYYY format | ||
| "01/10/2024", # MM/DD/YYYY format with leading zero | ||
| ], | ||
| ) | ||
| def test_loc_indexing_non_iso_single_key_deprecation(self, ser_daily, date_string): | ||
| # GH#58302 | ||
| msg = "Parsing non-ISO datetime strings in .loc is deprecated" | ||
|
|
||
| with tm.assert_produces_warning(Pandas4Warning, match=msg): | ||
| result = ser_daily.loc[date_string] | ||
| assert result == 9 | ||
|
|
||
| @pytest.mark.parametrize( | ||
| "date_string,expected", | ||
| [ | ||
| ("2024-01-10", 9), # YYYY-MM-DD (ISO format) | ||
| ], | ||
| ) | ||
| def test_loc_indexing_iso_format_no_warning(self, ser_daily, date_string, expected): | ||
| # GH#58302 - ISO format (YYYY-MM-DD) should NOT warn | ||
| with tm.assert_produces_warning(None): | ||
| result = ser_daily.loc[date_string] | ||
| assert result == expected | ||
|
|
||
| @pytest.mark.parametrize( | ||
| "start_string", | ||
| [ | ||
| "1/10/2024", # MM/DD/YYYY format | ||
| "01/10/2024", # MM/DD/YYYY format with leading zero | ||
| ], | ||
| ) | ||
| def test_loc_slicing_non_iso_start_deprecation(self, ser_daily, start_string): | ||
| # GH#58302 - Non-ISO start in slice should warn | ||
| msg = "Parsing non-ISO datetime strings in .loc is deprecated" | ||
|
|
||
| with tm.assert_produces_warning(Pandas4Warning, match=msg): | ||
| result = ser_daily.loc[start_string:"2024-01-15"] | ||
| assert len(result) > 0 | ||
|
|
||
| @pytest.mark.parametrize( | ||
| "end_string", | ||
| [ | ||
| "5-01-2024", # DD-MM-YYYY format | ||
| "05-01-2024", # DD-MM-YYYY format with leading zero | ||
| ], | ||
| ) | ||
| def test_loc_slicing_non_iso_end_deprecation(self, ser_daily, end_string): | ||
| # GH#58302 - Non-ISO end in slice should warn | ||
| msg = "Parsing non-ISO datetime strings in .loc is deprecated" | ||
|
|
||
| with tm.assert_produces_warning(Pandas4Warning, match=msg): | ||
| result = ser_daily.loc["2024-01-01":end_string] | ||
| assert len(result) > 0 | ||
|
|
||
| def test_loc_slicing_both_non_iso_deprecation(self, ser_daily): | ||
| # GH#58302 - Both non-ISO should warn (twice) | ||
| msg = "Parsing non-ISO datetime strings in .loc is deprecated" | ||
|
|
||
| with tm.assert_produces_warning( | ||
| Pandas4Warning, match=msg, check_stacklevel=False | ||
| ): | ||
| result = ser_daily.loc["1/10/2024":"5-01-2024"] | ||
| assert len(result) > 0 | ||
|
|
||
| def test_loc_slicing_iso_formats_no_warning(self, ser_daily): | ||
| # GH#58302 - ISO slice formats should NOT warn | ||
| with tm.assert_produces_warning(None): | ||
| result = ser_daily.loc["2024-01-05":"2024-01-10"] | ||
| assert len(result) == 6 | ||
|
|
||
| def test_loc_non_string_keys_no_warning(self, ser_daily): | ||
| # GH#58302 - Non-string keys should not warn | ||
| with tm.assert_produces_warning(None): | ||
| result = ser_daily.loc[Timestamp("2024-01-10")] | ||
| assert result == 9 | ||
|
|
||
| def test_loc_indexing_invalid_iso_pattern_raises_keyerror(self, ser_daily): | ||
| # GH#58302 - Malformed date strings fail at parsing, before ISO check | ||
| with pytest.raises(KeyError, match="2025-ANYTHING-GOES"): | ||
| ser_daily.loc["2025-ANYTHING-GOES"] | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Credit to @stefmolin for the original idea, but I don't think we should try and roll our own regex here if we can avoid it.
I see the standard library provides
date.fromisostring, although that is documented to not work with "Reduced Precision" dates:https://docs.python.org/3/library/datetime.html
Even still I wonder if we can't use that first and only fallback when it fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @stefmolin @WillAyd!. Switched to using date.fromisoformat() like you suggested. Added a regex fallback to handle the reduced precision dates (YYYY and YYYY-MM) that fromisoformat doesn't support .