Skip to content

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Sep 30, 2025

  • use Frequency instead of str | BaseOffset / str | DateOffset
  • use timedelta instead of timedelta | Timedelta (as Timedelta inherits from timedelta anyway)

@MarcoGorelli MarcoGorelli marked this pull request as ready for review September 30, 2025 18:53
@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Sep 30, 2025

@MarcoGorelli can you resolve conflicts?

Copy link
Member

@loicdiridollou loicdiridollou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just confirm that turning str | DateOffset to Frequency is good?
Wondering if there is an edge case for something that is a BaseOffset but not DateOffset.
Otherwise looks good to me, just fix the conflicts, will approve in advance, thanks @MarcoGorelli

@MarcoGorelli
Copy link
Member Author

thanks for your reviews!

sorry for the lack of context, I should have linked to pandas-dev/pandas#60886 where there was some discussion on this (BaseOffset vs DateOffset)

def to_timestamp(
self,
freq: str | DateOffset | None = ...,
freq: Frequency | None = ...,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so here is a good example:

idx = pd.PeriodIndex(["2023", "2024", "2025"], freq="D")
s1 = pd.Series([1, 2, 3], index=idx)
s1.to_timestamp(freq=pd.tseries.offsets.BDay())

This will raise a FutureWarning:

<stdin>:1: FutureWarning: PeriodDtype[B] is deprecated and will be removed in a future version. Use a DatetimeIndex with freq='B' instead

The issue is that BDay is not a DateOffset. I think it is a bit of an edge case and this situation works at runtime for now but good to think about the future, I am good with letting the types a bit looser here, @Dr-Irv what do you think?

@loicdiridollou
Copy link
Member

@MarcoGorelli thanks for the context, my only little concern is that BDay is a BaseOffset and not DateOffset so that may give weird behaviors but since your PR makes the types a tiny bit looser in places where it was originally str | DateOffset I am good with the changes and we can investigate edge cases.

@MarcoGorelli MarcoGorelli marked this pull request as draft October 1, 2025 13:29
@MarcoGorelli
Copy link
Member Author

thanks for your review

there's quite a few places where PeriodDtype[B] can't be used, so they should probably all use DateOffset | str

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 1, 2025

Here are my thoughts on this. The parameters need to align with the docs.

For example, I looked at timedelta_range . In this PR, it changes the freq parameter from str | DateOffset to Frequency, but that is too wide, and the docs specifically say DateOffset.

Might be worth having 2 separate types in _typing.pyi: One for Frequency that uses DateOffset and another that uses BaseOffset, but not sure about that.

@loicdiridollou Part of the review process here should be to compare each of the methods to the docs, and use that as a guide as to what goes in this PR.

@MarcoGorelli
Copy link
Member Author

Conversely, there are some parts which currently are annotated with BaseOffset but should probably be DateOffset

In [39]: pd.Period('2020').asfreq(pd.offsets.BusinessDay())
<ipython-input-39-1953d30c79b5>:1: FutureWarning: Period with BDay freq is deprecated and will be removed in a future version. Use a DatetimeIndex with BDay freq instead.
  pd.Period('2020').asfreq(pd.offsets.BusinessDay())
Out[39]: Period('2020-12-31', 'B')

If BusinessDay is the only offset that this catches, then for what it's worth I don't think it's worth distinguishing them

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Oct 1, 2025

The issue is that BDay is not a DateOffset

Note that Day is also not assignable DateOffset, even though PeriodType[D] is perfectly valid (furthering my stance that they're not worth distinguishing them in stubs, but I'm happy to respect whatever decision is made here)

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 1, 2025

If BusinessDay is the only offset that this catches, then for what it's worth I don't think it's worth distinguishing them

I agree. I think there are other offsets that aren't supported as well, e.g., BusinessMonthEnd. We probably could go through offsets.pyx and figure out which classes have _period_dtype_code set and then create a type for that to get the proper restrictions. Not needed for this PR.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 1, 2025

If BusinessDay is the only offset that this catches, then for what it's worth I don't think it's worth distinguishing them

I agree. I think there are other offsets that aren't supported as well, e.g., BusinessMonthEnd. We probably could go through offsets.pyx and figure out which classes have _period_dtype_code set and then create a type for that to get the proper restrictions. Not needed for this PR.

Actually via some grep, here's the list of valid offsets that can be used as a Period frequency:

Day | Hour | Minute | Second | Milli | Micro | Nano | YearEnd | QuarterEnd | MonthEnd | Week 

@MarcoGorelli Can you create a type PeriodFrequency for that in _typing.pyi and use that instead?

Comment on lines 1684 to +1687
def to_timestamp(
self,
freq=...,
freq: PeriodFrequency | None = None,
Copy link
Member Author

@MarcoGorelli MarcoGorelli Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've always found bizarre that to_timestamp only accepts period frequencies

In [4]: pd.Period('2020').to_timestamp('MS')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[4], line 1
----> 1 pd.Period('2020').to_timestamp('MS')

File ~/scratch/.39venv/lib/python3.9/site-packages/pandas/_libs/tslibs/period.pyx:1868, in pandas._libs.tslibs.period._Period.to_timestamp()

AttributeError: 'pandas._libs.tslibs.offsets.MonthBegin' object has no attribute '_period_dtype_code'

(and, on newer versions of pandas):

In [2]: pd.Period('2020').to_timestamp('MS')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File pandas/_libs/tslibs/offsets.pyx:5363, in pandas._libs.tslibs.offsets.to_offset()

File pandas/_libs/tslibs/offsets.pyx:5246, in pandas._libs.tslibs.offsets._validate_to_offset_alias()

ValueError: for Period, please use 'M' instead of 'MS'

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
Cell In[2], line 1
----> 1 pd.Period('2020').to_timestamp('MS')

File pandas/_libs/tslibs/period.pyx:2041, in pandas._libs.tslibs.period._Period.to_timestamp()

File pandas/_libs/tslibs/period.pyx:1771, in pandas._libs.tslibs.period._Period._maybe_convert_freq()

File pandas/_libs/tslibs/offsets.pyx:5402, in pandas._libs.tslibs.offsets.to_offset()

ValueError: Invalid frequency: MS, failed to parse with error message: ValueError("for Period, please use 'M' instead of 'MS'")

but whatever, if that's how it is then it should at least be typed to respect that 🤷

Comment on lines -2227 to +2228
freq: DateOffset | dt.timedelta | _str | None = ...,
freq: Frequency | dt.timedelta | None = ...,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the typing here was too tight to begin with

In [6]: s = pd.Series([1,2,3], index=pd.date_range('2020', freq='MS', periods=3))

In [7]: s.pct_change(freq='MS')
Out[7]:
2020-01-01    NaN
2020-02-01    1.0
2020-03-01    0.5
Freq: MS, dtype: float64

def to_offset(freq: None, is_period: bool = ...) -> None: ...
@overload
def to_offset(freq: Frequency, is_period: bool = ...) -> DateOffset: ...
def to_offset(freq: Frequency, is_period: bool = ...) -> BaseOffset: ...
Copy link
Member Author

@MarcoGorelli MarcoGorelli Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is capable of returning frequencies beyond just the DateOffset ones

In [8]: pd.tseries.frequencies.to_offset('B')
Out[8]: <BusinessDay>

@MarcoGorelli MarcoGorelli marked this pull request as ready for review October 1, 2025 17:30
Comment on lines -133 to +135
def round(self, freq: str | BaseOffset) -> Self: ...
def floor(self, freq: str | BaseOffset) -> Self: ...
def ceil(self, freq: str | BaseOffset) -> Self: ...
def round(self, freq: Frequency) -> Self: ...
def floor(self, freq: Frequency) -> Self: ...
def ceil(self, freq: Frequency) -> Self: ...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs say str but offset objects are also accepted, so I think it's ok to keep the current annotation

In [11]: pd.Timedelta(minutes=37).round(pd.tseries.offsets.Hour())
Out[11]: Timedelta('0 days 01:00:00')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants