Skip to content

ENH: let .dt.isocalendar() return float64 in presence of NaT #54657

@kmuehlbauer

Description

@kmuehlbauer

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Currently .dt.isocalendar() returns UInt32 with pd.NA in presence of NaT, whereas .dt.year returns float64 with np.nan. We've encountered this discrepancy over at xarray pydata/xarray#7928.

import pandas as pd
s = pd.to_datetime(pd.Series(['2021-12-01', '2021-12-02', '2021-12-03', pd.NaT]))
print("ISOCALENDAR")
print(s.dt.isocalendar().year)
print("YEAR")
print(s.dt.year)
ISOCALENDAR
0    2021
1    2021
2    2021
3    <NA>
Name: year, dtype: UInt32
YEAR
0    2021.0
1    2021.0
2    2021.0
3       NaN
dtype: float64

We could align that at the respective xarray accessor, but it would make more sense to align it here.

Feature Description

One solution would be to use the same functionality present in _field_accessor (maybe_mask_results) to do the conversion to float64 in presence of NaT. Please have a look at the below code.

 def isocalendar(self) -> DataFrame:
      """
      Calculate year, week, and day according to the ISO 8601 standard.

      .. versionadded:: 1.1.0

      Returns
      -------
      DataFrame
          With columns year, week and day.

      See Also
      --------
      Timestamp.isocalendar : Function return a 3-tuple containing ISO year,
          week number, and weekday for the given Timestamp object.
      datetime.date.isocalendar : Return a named tuple object with
          three components: year, week and weekday.

      Examples
      --------
      >>> idx = pd.date_range(start='2019-12-29', freq='D', periods=4)
      >>> idx.isocalendar()
                  year  week  day
      2019-12-29  2019    52    7
      2019-12-30  2020     1    1
      2019-12-31  2020     1    2
      2020-01-01  2020     1    3
      >>> idx.isocalendar().week
      2019-12-29    52
      2019-12-30     1
      2019-12-31     1
      2020-01-01     1
      Freq: D, Name: week, dtype: UInt32
      """
      from pandas import DataFrame

      values = self._local_timestamps()
      sarray = fields.build_isocalendar_sarray(values, reso=self._creso)
      dtype = np.dtype([('year', 'float64'), ('week', 'float64'), ('day', 'float64')])
      sarray = self._maybe_mask_results(
           sarray, fill_value=None, convert=dtype
      )
      dtype = None if sarray.dtype == dtype else "UInt32"
      iso_calendar_df = DataFrame(
          sarray, columns=["year", "week", "day"],  dtype=dtype
      )
      if dtype != sarray.dtype:
          if self._hasna:
              iso_calendar_df.iloc[self._isnan] = None

      return iso_calendar_df

I can move this into a Pull Request, if there is interest. I'll also try to implemented some workaround in xarray until a final solution has settled.

Alternative Solutions

No alternative solutions considered.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    DatetimeDatetime data dtypeDtype ConversionsUnexpected or buggy dtype conversionsEnhancementNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions