-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
Currently .dt.isocalendar()
returns UInt32
with pd.NA
in presence of NaT
, whereas .dt.year
returns float64
with np.nan
. We've encountered this discrepancy over at xarray pydata/xarray#7928.
import pandas as pd
s = pd.to_datetime(pd.Series(['2021-12-01', '2021-12-02', '2021-12-03', pd.NaT]))
print("ISOCALENDAR")
print(s.dt.isocalendar().year)
print("YEAR")
print(s.dt.year)
ISOCALENDAR
0 2021
1 2021
2 2021
3 <NA>
Name: year, dtype: UInt32
YEAR
0 2021.0
1 2021.0
2 2021.0
3 NaN
dtype: float64
We could align that at the respective xarray accessor, but it would make more sense to align it here.
Feature Description
One solution would be to use the same functionality present in _field_accessor
(maybe_mask_results
) to do the conversion to float64
in presence of NaT
. Please have a look at the below code.
def isocalendar(self) -> DataFrame:
"""
Calculate year, week, and day according to the ISO 8601 standard.
.. versionadded:: 1.1.0
Returns
-------
DataFrame
With columns year, week and day.
See Also
--------
Timestamp.isocalendar : Function return a 3-tuple containing ISO year,
week number, and weekday for the given Timestamp object.
datetime.date.isocalendar : Return a named tuple object with
three components: year, week and weekday.
Examples
--------
>>> idx = pd.date_range(start='2019-12-29', freq='D', periods=4)
>>> idx.isocalendar()
year week day
2019-12-29 2019 52 7
2019-12-30 2020 1 1
2019-12-31 2020 1 2
2020-01-01 2020 1 3
>>> idx.isocalendar().week
2019-12-29 52
2019-12-30 1
2019-12-31 1
2020-01-01 1
Freq: D, Name: week, dtype: UInt32
"""
from pandas import DataFrame
values = self._local_timestamps()
sarray = fields.build_isocalendar_sarray(values, reso=self._creso)
dtype = np.dtype([('year', 'float64'), ('week', 'float64'), ('day', 'float64')])
sarray = self._maybe_mask_results(
sarray, fill_value=None, convert=dtype
)
dtype = None if sarray.dtype == dtype else "UInt32"
iso_calendar_df = DataFrame(
sarray, columns=["year", "week", "day"], dtype=dtype
)
if dtype != sarray.dtype:
if self._hasna:
iso_calendar_df.iloc[self._isnan] = None
return iso_calendar_df
I can move this into a Pull Request, if there is interest. I'll also try to implemented some workaround in xarray until a final solution has settled.
Alternative Solutions
No alternative solutions considered.
Additional Context
No response