Skip to content

Commit c11794c

Browse files
jbrockmendelmittal-aakriti
authored andcommitted
API: to_datetime strings default to microsecond (pandas-dev#62801)
1 parent 75084ed commit c11794c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+406
-314
lines changed

doc/source/user_guide/timeseries.rst

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,19 @@ inferred frequency upon creation:
241241
242242
pd.DatetimeIndex(["2018-01-01", "2018-01-03", "2018-01-05"], freq="infer")
243243
244+
In most cases, parsing strings to datetimes (with any of :func:`to_datetime`, :class:`DatetimeIndex`, or :class:`Timestamp`) will produce objects with microsecond ("us") unit. The exception to this rule is if your strings have nanosecond precision, in which case the result will have "ns" unit:
245+
246+
.. ipython:: python
247+
248+
pd.to_datetime(["2016-01-01 02:03:04"]).unit
249+
pd.to_datetime(["2016-01-01 02:03:04.123"]).unit
250+
pd.to_datetime(["2016-01-01 02:03:04.123456"]).unit
251+
pd.to_datetime(["2016-01-01 02:03:04.123456789"]).unit
252+
253+
.. versionchanged:: 3.0.0
254+
255+
Previously, :func:`to_datetime` and :class:`DatetimeIndex` would always parse strings to "ns" unit. During pandas 2.x, :class:`Timestamp` could give any of "s", "ms", "us", or "ns" depending on the specificity of the input string.
256+
244257
.. _timeseries.converting.format:
245258

246259
Providing a format argument
@@ -379,6 +392,16 @@ We subtract the epoch (midnight at January 1, 1970 UTC) and then floor divide by
379392
380393
(stamps - pd.Timestamp("1970-01-01")) // pd.Timedelta("1s")
381394
395+
Another common way to perform this conversion is to convert directly to an integer dtype. Note that the exact integers this produces will depend on the specific unit
396+
or resolution of the datetime64 dtype:
397+
398+
.. ipython:: python
399+
400+
stamps.astype(np.int64)
401+
stamps.astype("datetime64[s]").astype(np.int64)
402+
stamps.astype("datetime64[ms]").astype(np.int64)
403+
404+
382405
.. _timeseries.origin:
383406

384407
Using the ``origin`` parameter

doc/source/whatsnew/v3.0.0.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -358,7 +358,7 @@ When passing strings, the resolution will depend on the precision of the string,
358358
In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype
359359
Out[5]: dtype('<M8[ns]')
360360
361-
The inferred resolution now matches that of the input strings:
361+
The inferred resolution now matches that of the input strings for nanosecond-precision strings, otherwise defaulting to microseconds:
362362

363363
.. ipython:: python
364364
@@ -367,13 +367,17 @@ The inferred resolution now matches that of the input strings:
367367
In [4]: pd.to_datetime(["2024-03-22 11:43:01.002003"]).dtype
368368
In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype
369369
370+
This is also a change for the :class:`Timestamp` constructor with a string input, which in version 2.x.y could give second or millisecond unit, which users generally disliked (:issue:`52653`)
371+
370372
In cases with mixed-resolution inputs, the highest resolution is used:
371373

372374
.. code-block:: ipython
373375
374376
In [2]: pd.to_datetime([pd.Timestamp("2024-03-22 11:43:01"), "2024-03-22 11:43:01.002"]).dtype
375377
Out[2]: dtype('<M8[ns]')
376378
379+
.. warning:: Many users will now get "M8[us]" dtype data in cases when they used to get "M8[ns]". For most use cases they should not notice a difference. One big exception is converting to integers, which will give integers 1000x smaller.
380+
377381
.. _whatsnew_300.api_breaking.concat_datetime_sorting:
378382

379383
:func:`concat` no longer ignores ``sort`` when all objects have a :class:`DatetimeIndex`

pandas/_libs/tslibs/conversion.pyx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -623,6 +623,8 @@ cdef _TSObject convert_str_to_tsobject(str ts, tzinfo tz,
623623
)
624624
if not string_to_dts_failed:
625625
reso = get_supported_reso(out_bestunit)
626+
if reso < NPY_FR_us:
627+
reso = NPY_FR_us
626628
check_dts_bounds(&dts, reso)
627629
obj = _TSObject()
628630
obj.dts = dts
@@ -661,6 +663,8 @@ cdef _TSObject convert_str_to_tsobject(str ts, tzinfo tz,
661663
nanos=&nanos,
662664
)
663665
reso = get_supported_reso(out_bestunit)
666+
if reso < NPY_FR_us:
667+
reso = NPY_FR_us
664668
return convert_datetime_to_tsobject(dt, tz, nanos=nanos, reso=reso)
665669

666670

pandas/_libs/tslibs/strptime.pyx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -466,6 +466,8 @@ def array_strptime(
466466
# No error reported by string_to_dts, pick back up
467467
# where we left off
468468
item_reso = get_supported_reso(out_bestunit)
469+
if item_reso < NPY_DATETIMEUNIT.NPY_FR_us:
470+
item_reso = NPY_DATETIMEUNIT.NPY_FR_us
469471
state.update_creso(item_reso)
470472
if infer_reso:
471473
creso = state.creso
@@ -510,6 +512,8 @@ def array_strptime(
510512
val, fmt, exact, format_regex, locale_time, &dts, &item_reso
511513
)
512514

515+
if item_reso < NPY_DATETIMEUNIT.NPY_FR_us:
516+
item_reso = NPY_DATETIMEUNIT.NPY_FR_us
513517
state.update_creso(item_reso)
514518
if infer_reso:
515519
creso = state.creso

pandas/conftest.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -935,7 +935,7 @@ def rand_series_with_duplicate_datetimeindex() -> Series:
935935
(Period("2012-01", freq="M"), "period[M]"),
936936
(Period("2012-02-01", freq="D"), "period[D]"),
937937
(
938-
Timestamp("2011-01-01", tz="US/Eastern"),
938+
Timestamp("2011-01-01", tz="US/Eastern").as_unit("s"),
939939
DatetimeTZDtype(unit="s", tz="US/Eastern"),
940940
),
941941
(Timedelta(seconds=500), "timedelta64[ns]"),

pandas/core/algorithms.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -370,7 +370,7 @@ def unique(values):
370370
array([2, 1])
371371
372372
>>> pd.unique(pd.Series([pd.Timestamp("20160101"), pd.Timestamp("20160101")]))
373-
array(['2016-01-01T00:00:00'], dtype='datetime64[s]')
373+
array(['2016-01-01T00:00:00.000000'], dtype='datetime64[us]')
374374
375375
>>> pd.unique(
376376
... pd.Series(

pandas/core/arrays/datetimelike.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1923,11 +1923,11 @@ def strftime(self, date_format: str) -> npt.NDArray[np.object_]:
19231923
19241924
>>> rng_tz.floor("2h", ambiguous=False)
19251925
DatetimeIndex(['2021-10-31 02:00:00+01:00'],
1926-
dtype='datetime64[s, Europe/Amsterdam]', freq=None)
1926+
dtype='datetime64[us, Europe/Amsterdam]', freq=None)
19271927
19281928
>>> rng_tz.floor("2h", ambiguous=True)
19291929
DatetimeIndex(['2021-10-31 02:00:00+02:00'],
1930-
dtype='datetime64[s, Europe/Amsterdam]', freq=None)
1930+
dtype='datetime64[us, Europe/Amsterdam]', freq=None)
19311931
"""
19321932

19331933
_floor_example = """>>> rng.floor('h')
@@ -1950,11 +1950,11 @@ def strftime(self, date_format: str) -> npt.NDArray[np.object_]:
19501950
19511951
>>> rng_tz.floor("2h", ambiguous=False)
19521952
DatetimeIndex(['2021-10-31 02:00:00+01:00'],
1953-
dtype='datetime64[s, Europe/Amsterdam]', freq=None)
1953+
dtype='datetime64[us, Europe/Amsterdam]', freq=None)
19541954
19551955
>>> rng_tz.floor("2h", ambiguous=True)
19561956
DatetimeIndex(['2021-10-31 02:00:00+02:00'],
1957-
dtype='datetime64[s, Europe/Amsterdam]', freq=None)
1957+
dtype='datetime64[us, Europe/Amsterdam]', freq=None)
19581958
"""
19591959

19601960
_ceil_example = """>>> rng.ceil('h')
@@ -1977,11 +1977,11 @@ def strftime(self, date_format: str) -> npt.NDArray[np.object_]:
19771977
19781978
>>> rng_tz.ceil("h", ambiguous=False)
19791979
DatetimeIndex(['2021-10-31 02:00:00+01:00'],
1980-
dtype='datetime64[s, Europe/Amsterdam]', freq=None)
1980+
dtype='datetime64[us, Europe/Amsterdam]', freq=None)
19811981
19821982
>>> rng_tz.ceil("h", ambiguous=True)
19831983
DatetimeIndex(['2021-10-31 02:00:00+02:00'],
1984-
dtype='datetime64[s, Europe/Amsterdam]', freq=None)
1984+
dtype='datetime64[us, Europe/Amsterdam]', freq=None)
19851985
"""
19861986

19871987

pandas/core/arrays/datetimes.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,7 @@ class DatetimeArray(dtl.TimelikeOps, dtl.DatelikeOps):
220220
... )
221221
<DatetimeArray>
222222
['2023-01-01 00:00:00', '2023-01-02 00:00:00']
223-
Length: 2, dtype: datetime64[s]
223+
Length: 2, dtype: datetime64[us]
224224
"""
225225

226226
__module__ = "pandas.arrays"
@@ -612,7 +612,7 @@ def tz(self) -> tzinfo | None:
612612
>>> s
613613
0 2020-01-01 10:00:00+00:00
614614
1 2020-02-01 11:00:00+00:00
615-
dtype: datetime64[s, UTC]
615+
dtype: datetime64[us, UTC]
616616
>>> s.dt.tz
617617
datetime.timezone.utc
618618
@@ -1441,7 +1441,7 @@ def time(self) -> npt.NDArray[np.object_]:
14411441
>>> s
14421442
0 2020-01-01 10:00:00+00:00
14431443
1 2020-02-01 11:00:00+00:00
1444-
dtype: datetime64[s, UTC]
1444+
dtype: datetime64[us, UTC]
14451445
>>> s.dt.time
14461446
0 10:00:00
14471447
1 11:00:00
@@ -1484,7 +1484,7 @@ def timetz(self) -> npt.NDArray[np.object_]:
14841484
>>> s
14851485
0 2020-01-01 10:00:00+00:00
14861486
1 2020-02-01 11:00:00+00:00
1487-
dtype: datetime64[s, UTC]
1487+
dtype: datetime64[us, UTC]
14881488
>>> s.dt.timetz
14891489
0 10:00:00+00:00
14901490
1 11:00:00+00:00
@@ -1526,7 +1526,7 @@ def date(self) -> npt.NDArray[np.object_]:
15261526
>>> s
15271527
0 2020-01-01 10:00:00+00:00
15281528
1 2020-02-01 11:00:00+00:00
1529-
dtype: datetime64[s, UTC]
1529+
dtype: datetime64[us, UTC]
15301530
>>> s.dt.date
15311531
0 2020-01-01
15321532
1 2020-02-01
@@ -1875,7 +1875,7 @@ def isocalendar(self) -> DataFrame:
18751875
>>> s
18761876
0 2020-01-01 10:00:00+00:00
18771877
1 2020-02-01 11:00:00+00:00
1878-
dtype: datetime64[s, UTC]
1878+
dtype: datetime64[us, UTC]
18791879
>>> s.dt.dayofyear
18801880
0 1
18811881
1 32
@@ -1911,7 +1911,7 @@ def isocalendar(self) -> DataFrame:
19111911
>>> s
19121912
0 2020-01-01 10:00:00+00:00
19131913
1 2020-04-01 11:00:00+00:00
1914-
dtype: datetime64[s, UTC]
1914+
dtype: datetime64[us, UTC]
19151915
>>> s.dt.quarter
19161916
0 1
19171917
1 2
@@ -1947,7 +1947,7 @@ def isocalendar(self) -> DataFrame:
19471947
>>> s
19481948
0 2020-01-01 10:00:00+00:00
19491949
1 2020-02-01 11:00:00+00:00
1950-
dtype: datetime64[s, UTC]
1950+
dtype: datetime64[us, UTC]
19511951
>>> s.dt.daysinmonth
19521952
0 31
19531953
1 29

pandas/core/base.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1380,7 +1380,7 @@ def factorize(
13801380
0 2000-03-11
13811381
1 2000-03-12
13821382
2 2000-03-13
1383-
dtype: datetime64[s]
1383+
dtype: datetime64[us]
13841384
13851385
>>> ser.searchsorted('3/14/2000')
13861386
np.int64(3)

pandas/core/dtypes/missing.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,7 @@ def isna(obj: object) -> bool | npt.NDArray[np.bool_] | NDFrame:
150150
>>> index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None, "2017-07-08"])
151151
>>> index
152152
DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'],
153-
dtype='datetime64[s]', freq=None)
153+
dtype='datetime64[us]', freq=None)
154154
>>> pd.isna(index)
155155
array([False, False, True, False])
156156
@@ -365,7 +365,7 @@ def notna(obj: object) -> bool | npt.NDArray[np.bool_] | NDFrame:
365365
>>> index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None, "2017-07-08"])
366366
>>> index
367367
DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'],
368-
dtype='datetime64[s]', freq=None)
368+
dtype='datetime64[us]', freq=None)
369369
>>> pd.notna(index)
370370
array([ True, True, False, True])
371371

0 commit comments

Comments
 (0)