-
-
Notifications
You must be signed in to change notification settings - Fork 150
type freq
in shift
, consistently use Frequency
alias
#1394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@MarcoGorelli can you resolve conflicts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just confirm that turning str | DateOffset
to Frequency
is good?
Wondering if there is an edge case for something that is a BaseOffset
but not DateOffset
.
Otherwise looks good to me, just fix the conflicts, will approve in advance, thanks @MarcoGorelli
thanks for your reviews! sorry for the lack of context, I should have linked to pandas-dev/pandas#60886 where there was some discussion on this (BaseOffset vs DateOffset) |
def to_timestamp( | ||
self, | ||
freq: str | DateOffset | None = ..., | ||
freq: Frequency | None = ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so here is a good example:
idx = pd.PeriodIndex(["2023", "2024", "2025"], freq="D")
s1 = pd.Series([1, 2, 3], index=idx)
s1.to_timestamp(freq=pd.tseries.offsets.BDay())
This will raise a FutureWarning
:
<stdin>:1: FutureWarning: PeriodDtype[B] is deprecated and will be removed in a future version. Use a DatetimeIndex with freq='B' instead
The issue is that BDay
is not a DateOffset. I think it is a bit of an edge case and this situation works at runtime for now but good to think about the future, I am good with letting the types a bit looser here, @Dr-Irv what do you think?
@MarcoGorelli thanks for the context, my only little concern is that |
thanks for your review there's quite a few places where |
Here are my thoughts on this. The parameters need to align with the docs. For example, I looked at Might be worth having 2 separate types in @loicdiridollou Part of the review process here should be to compare each of the methods to the docs, and use that as a guide as to what goes in this PR. |
Conversely, there are some parts which currently are annotated with In [39]: pd.Period('2020').asfreq(pd.offsets.BusinessDay())
<ipython-input-39-1953d30c79b5>:1: FutureWarning: Period with BDay freq is deprecated and will be removed in a future version. Use a DatetimeIndex with BDay freq instead.
pd.Period('2020').asfreq(pd.offsets.BusinessDay())
Out[39]: Period('2020-12-31', 'B') If BusinessDay is the only offset that this catches, then for what it's worth I don't think it's worth distinguishing them |
Note that |
I agree. I think there are other offsets that aren't supported as well, e.g., |
Actually via some Day | Hour | Minute | Second | Milli | Micro | Nano | YearEnd | QuarterEnd | MonthEnd | Week @MarcoGorelli Can you create a type |
def to_timestamp( | ||
self, | ||
freq=..., | ||
freq: PeriodFrequency | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've always found bizarre that to_timestamp
only accepts period frequencies
In [4]: pd.Period('2020').to_timestamp('MS')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 pd.Period('2020').to_timestamp('MS')
File ~/scratch/.39venv/lib/python3.9/site-packages/pandas/_libs/tslibs/period.pyx:1868, in pandas._libs.tslibs.period._Period.to_timestamp()
AttributeError: 'pandas._libs.tslibs.offsets.MonthBegin' object has no attribute '_period_dtype_code'
(and, on newer versions of pandas):
In [2]: pd.Period('2020').to_timestamp('MS')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File pandas/_libs/tslibs/offsets.pyx:5363, in pandas._libs.tslibs.offsets.to_offset()
File pandas/_libs/tslibs/offsets.pyx:5246, in pandas._libs.tslibs.offsets._validate_to_offset_alias()
ValueError: for Period, please use 'M' instead of 'MS'
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
Cell In[2], line 1
----> 1 pd.Period('2020').to_timestamp('MS')
File pandas/_libs/tslibs/period.pyx:2041, in pandas._libs.tslibs.period._Period.to_timestamp()
File pandas/_libs/tslibs/period.pyx:1771, in pandas._libs.tslibs.period._Period._maybe_convert_freq()
File pandas/_libs/tslibs/offsets.pyx:5402, in pandas._libs.tslibs.offsets.to_offset()
ValueError: Invalid frequency: MS, failed to parse with error message: ValueError("for Period, please use 'M' instead of 'MS'")
but whatever, if that's how it is then it should at least be typed to respect that 🤷
freq: DateOffset | dt.timedelta | _str | None = ..., | ||
freq: Frequency | dt.timedelta | None = ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the typing here was too tight to begin with
In [6]: s = pd.Series([1,2,3], index=pd.date_range('2020', freq='MS', periods=3))
In [7]: s.pct_change(freq='MS')
Out[7]:
2020-01-01 NaN
2020-02-01 1.0
2020-03-01 0.5
Freq: MS, dtype: float64
def to_offset(freq: None, is_period: bool = ...) -> None: ... | ||
@overload | ||
def to_offset(freq: Frequency, is_period: bool = ...) -> DateOffset: ... | ||
def to_offset(freq: Frequency, is_period: bool = ...) -> BaseOffset: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is capable of returning frequencies beyond just the DateOffset
ones
In [8]: pd.tseries.frequencies.to_offset('B')
Out[8]: <BusinessDay>
def round(self, freq: str | BaseOffset) -> Self: ... | ||
def floor(self, freq: str | BaseOffset) -> Self: ... | ||
def ceil(self, freq: str | BaseOffset) -> Self: ... | ||
def round(self, freq: Frequency) -> Self: ... | ||
def floor(self, freq: Frequency) -> Self: ... | ||
def ceil(self, freq: Frequency) -> Self: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs say str
but offset objects are also accepted, so I think it's ok to keep the current annotation
In [11]: pd.Timedelta(minutes=37).round(pd.tseries.offsets.Hour())
Out[11]: Timedelta('0 days 01:00:00')
Frequency
instead ofstr | BaseOffset
/str | DateOffset
timedelta
instead oftimedelta | Timedelta
(asTimedelta
inherits fromtimedelta
anyway)