Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -649,6 +649,7 @@ Conversion
- Bug in :meth:`DataFrame.update` bool dtype being converted to object (:issue:`55509`)
- Bug in :meth:`Series.astype` might modify read-only array inplace when casting to a string dtype (:issue:`57212`)
- Bug in :meth:`Series.reindex` not maintaining ``float32`` type when a ``reindex`` introduces a missing value (:issue:`45857`)
- Bug in :meth: 'Series.convert_dtype' strips the timezone on an already Timezone aware pyarrow timestamp dtype (:issue:'60237')

Strings
^^^^^^^
Expand Down
12 changes: 0 additions & 12 deletions pandas/core/dtypes/dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -2277,18 +2277,6 @@ def name(self) -> str: # type: ignore[override]
@cache_readonly
def numpy_dtype(self) -> np.dtype:
"""Return an instance of the related numpy dtype"""
if pa.types.is_timestamp(self.pyarrow_dtype):
# pa.timestamp(unit).to_pandas_dtype() returns ns units
# regardless of the pyarrow timestamp units.
# This can be removed if/when pyarrow addresses it:
# https://github.com/apache/arrow/issues/34462
return np.dtype(f"datetime64[{self.pyarrow_dtype.unit}]")
if pa.types.is_duration(self.pyarrow_dtype):
# pa.duration(unit).to_pandas_dtype() returns ns units
# regardless of the pyarrow duration units
# This can be removed if/when pyarrow addresses it:
# https://github.com/apache/arrow/issues/34462
return np.dtype(f"timedelta64[{self.pyarrow_dtype.unit}]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By removing this, it will go through np.dtype(self.pyarrow_dtype.to_pandas_dtype()), which I think will raise a type error? (because to_pandas_dtype returns a DatetimeTZDtype, and np.dtype(..) does not recognize that) And so this will start to return object dtype instead of datetime64?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a bit of context from my linked issue, in pandas 2.0, an ArrowDtype for a pyarrow timestamp with a non-null timezone did return a numpy object dtype

The if statements that are removed in this pull request were added in pandas 2.1.0rc0 (#51800) to fix the other issue you pointed out with the datetime unit being lost in older pyarrow versions (fix for that was added in pyarrow 14 apache/arrow#35656). It wasn't noted in the pull request that added them that it was intended to change the semantics for tz-aware types as well, so I think it was just an unintentional side effect that they started returning a numpy dtype of datetime64 instead of object.

I noted in the issue that the pyarrow table in the pandas arrays, scalars, and data types section of the API docs seems to indicate that a pyarrow timestamp should map to a pandas DatetimeTZDtype and a numpy datetime64 dtype (which does match how pyarrow itself handles converting a tz-aware array to pandas and numpy respectively)

Quoting from my comment on the issue

I would definitely defer to someone else's judgement on whether that is correct, or if there should be a distinction in that table linked between a pa.timestamp() type with and without timezone

You are pretty much exactly the person I had in mind of who would be best suited to make that judgement call. As far as I can tell, a tz-aware pyarrow timestamp is the only instance in that table that can lose information when it maps to the shown numpy dtype

if pa.types.is_string(self.pyarrow_dtype) or pa.types.is_large_string(
self.pyarrow_dtype
):
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/frame/methods/test_convert_dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,3 +196,11 @@ def test_convert_dtypes_from_arrow(self):
result = df.convert_dtypes()
expected = df.astype({"a": "string[python]"})
tm.assert_frame_equal(result, expected)

def test_convert_dtypes_timezone_series(self):
# GH#60237
ser = pd.Series(pd.date_range(start='2020-01-01', periods=5, freq='h', tz='UTC'))
ser = ser.astype("timestamp[ns, tz=UTC][pyarrow]")
expected = ser
result = ser.convert_dtypes(dtype_backend="pyarrow")
tm.assert_series_equal(result, expected)
Loading