Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -611,6 +611,7 @@ Categorical
Datetimelike
^^^^^^^^^^^^
- Bug in :attr:`is_year_start` where a DateTimeIndex constructed via a date_range with frequency 'MS' wouldn't have the correct year or quarter start attributes (:issue:`57377`)
- Bug in :class:`DataFrame` raising ``ValueError`` when ``dtype`` is ``timedelta64`` and ``data`` is a list containing ``None`` (:issue:`60064`)
- Bug in :class:`Timestamp` constructor failing to raise when ``tz=None`` is explicitly specified in conjunction with timezone-aware ``tzinfo`` or data (:issue:`48688`)
- Bug in :func:`date_range` where the last valid timestamp would sometimes not be produced (:issue:`56134`)
- Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56147`)
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -1225,6 +1225,9 @@ def maybe_cast_to_datetime(
_ensure_nanosecond_dtype(dtype)

if lib.is_np_dtype(dtype, "m"):
if isinstance(value, np.ndarray) and value.ndim == 2 and value.shape[1] == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would e.g. [None, 1] be converted to a 2D array?

Copy link
Contributor Author

@yuanx749 yuanx749 Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the Traceback, the input list is converted to a 2D array in ndarray_to_mgr, where _prep_ndarraylike returns _ensure_2d(values).

values = _prep_ndarraylike(values, copy=copy)

It seems for other types of input, _ensure_2d is also called.

Copy link
Member

@mroeschke mroeschke Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see. Generally I think there would still be problems if a user passes a nested list (e.g. a "2x2" nested list) or a user passes dtype="datetime64[unit]"

I think generally maybe_cast_to_datetime should assume the incoming value is 1D since the _from_sequence calls assume the values are 1D also, so the DataFrame code should apply this column column-wise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For nested list, nested_data_to_arrays in DataFrame.__init__ processes the data column-wise, so there is no problem.

There is no error for datetime64, because DatetimeArray._from_sequence actually happens to work with 2D array:

from pandas.core.arrays import DatetimeArray, TimedeltaArray
arr = DatetimeArray._from_sequence(np.array([[np.nan], [1]]), dtype="datetime64[ns]")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to move the ravel and reshape in _try_cast below, sort of like the Unicode string dtype elif branch, so as to ensure the input of maybe_cast_to_datetime is 1D.

elif dtype.kind == "U":
# TODO: test cases with arr.dtype.kind in "mM"
if is_ndarray:
arr = cast(np.ndarray, arr)
shape = arr.shape
if arr.ndim > 1:
arr = arr.ravel()
else:
shape = (len(arr),)
return lib.ensure_string_array(arr, convert_na_value=False, copy=copy).reshape(
shape
)
elif dtype.kind in "mM":
return maybe_cast_to_datetime(arr, dtype)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jbrockmendel if you have thoughts on this approach

res = TimedeltaArray._from_sequence(value.ravel(), dtype=dtype)
return res.reshape(value.shape)
res = TimedeltaArray._from_sequence(value, dtype=dtype)
return res
else:
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/frame/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -2772,6 +2772,14 @@ def test_construction_datetime_resolution_inference(self, cons):
res_dtype2 = tm.get_dtype(obj2)
assert res_dtype2 == "M8[us, US/Pacific]", res_dtype2

def test_construction_nan_value_timedelta64_dtype(self):
# GH#60064
result = DataFrame([None, 1], dtype="timedelta64[ns]")
expected = DataFrame(
["NaT", "0 days 00:00:00.000000001"], dtype="timedelta64[ns]"
)
tm.assert_frame_equal(result, expected)


class TestDataFrameConstructorIndexInference:
def test_frame_from_dict_of_series_overlapping_monthly_period_indexes(self):
Expand Down