-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Following pandas 3.0, make Day
cftime offset non-Tick
-like
#10650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Following pandas 3.0, make Day
cftime offset non-Tick
-like
#10650
Conversation
2f3d6c0
to
6b7f4bd
Compare
6b7f4bd
to
ea43798
Compare
def test_resample(freqs, closed, label, offset) -> None: | ||
def test_resample_with_tick_resample_freq(freqs, closed, label, offset) -> None: | ||
initial_freq, resample_freq = freqs | ||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since our minimum version of pandas is greater than or equal to 2.2, I took the opportunity to remove this old test-skipping logic.
) | ||
@pytest.mark.parametrize("closed", [None, "left", "right"]) | ||
@pytest.mark.parametrize("label", [None, "left", "right"]) | ||
def test_resample_with_non_tick_resample_freq(freqs, closed, label) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the offset
and origin
options are not relevant when resampling to a non-Tick
frequency, I split these tests off into their function to avoid warnings and avoid unnecessary duplication of tests.
@pytest.mark.parametrize( | ||
"freq,units", | ||
[ | ||
("D", "D"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that Day
is non-Tick
-like, multiplying by a float
is no longer valid in pandas, so shift with a float value and Day
offset will raise an error:
>>> import pandas as pd
>>> times = pd.date_range("2000", periods=5)
>>> times.shift(0.5, "D")
Traceback (most recent call last):
File "<python-input-2>", line 1, in <module>
times.shift(0.5, "D")
~~~~~~~~~~~^^^^^^^^^^
File "/Users/spencer/software/pandas/pandas/core/indexes/datetimelike.py", line 514, in shift
start = self[0] + periods * self.freq
~~~~~~~~^~~~~~~~~~~
TypeError: unsupported operand type(s) for *: 'float' and 'pandas._libs.tslibs.offsets.Day'
A similar error is raised now for attempting to shift a CFTimeIndex
in this way. This is tested in b87bb35.
da_cftime = da(cftime_index) | ||
with pytest.raises(ValueError, match="offset must be"): | ||
da_cftime.resample(time="2D", offset=offset) # type: ignore[arg-type] | ||
da_cftime.resample(time="2h", offset=offset) # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to "2h"
, since "2D"
is no longer a Tick
-like frequency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a major break. I've certainly used "D"
for resampling to a daily frequency; and almost certainly used "5D"
or "7D"
. I think we should do a deprecation cycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resampling to any multiple of "D"
is still supported. It is just that using the offset
and/or origin
arguments is no longer supported in that context, which is why I changed the particular test here. For what it is worth, pandas is not providing a deprecation warning for this change, though in the next version they will warn (as I emulate here) when offset
and/or origin
are provided and they will have no effect.
In principle, I think the problems this aimed to solve in pandas are not relevant for cftime.datetime
objects, which do not take into account timezones / daylight savings time transitions. With more work, we might be able maintain the old behavior for cftime.datetime
objects going forward, though given the complexity of this code, I tentatively prefer to stick as close to the pandas behavior as possible. I'm open to more discussion, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok i didn't realize that the resampling syntax worked.
I think the problems this aimed to solve in pandas are not relevant for cftime.datetime objects, which do not take into account timezones / daylight savings time transitions. With more work, we might be able maintain the old behavior for cftime.datetime objects going forward, though given the complexity of this code, I tentatively prefer to stick as close to the pandas behavior as possible.
I saw that discussion, and agree with your approach. It seems the least burden in the long run, at least until someone complains ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, I appreciate you reviewing this. I will go ahead and merge.
timedelta_result = da_cftime.resample(time="2h", offset=timedelta).mean() | ||
string_result = da_cftime.resample(time="2h", offset=string).mean() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to "2h"
, since "2D"
is no longer a Tick
-like frequency.
(Second(), 3, Second(n=3)), | ||
(Millisecond(), 3, Millisecond(n=3)), | ||
(Microsecond(), 3, Microsecond(n=3)), | ||
(Day(), 0.5, Hour(n=12)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that Day
is non-Tick
-like, multiplying by a float
is no longer valid in pandas:
>>> import pandas as pd
>>> 2.0 * pd.tseries.frequencies.Day(2)
Traceback (most recent call last):
File "<python-input-2>", line 1, in <module>
2.0 * pd.tseries.frequencies.Day(2)
~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for *: 'float' and 'pandas._libs.tslibs.offsets.Day'
A similar error is raised now for the cftime version of this offset. This is tested in b87bb35.
Co-authored-by: Deepak Cherian <[email protected]>
In pandas 3.0,
Day
will no longer be considered aTick
-like frequency (pandas-dev/pandas#61985). This PR ports this change to the cftime version of this offset. The main implication in the cftime / xarray context is that it means theoffset
andorigin
options inresample
will no longer have an effect when resampling to a daily frequency. As in pandas-dev/pandas#62101, warnings will be emitted if non-default values are passed.whats-new.rst