Skip to content
Merged
Changes from 10 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
bd4061c
Update datetimes.py
smarie Jul 12, 2021
66c725d
from code review: improved utc doc
Oct 4, 2021
74f6aa2
Merge branch 'master' of https://github.com/pandas-dev/pandas into pa…
Oct 4, 2021
f5cbef8
Improved overall readability by
Oct 5, 2021
866bdcb
minor improvement
Oct 5, 2021
cd3ec35
Minor fix and improvement again
Oct 5, 2021
1be053a
Changed order of output description to match the global section doc
Oct 5, 2021
41a1e53
Removed the "type: ignore" since the return type hints are now fixed
Oct 5, 2021
95cfc54
Removed type hint-related mods (will move to a separate pr)
Oct 8, 2021
6569b1e
Merge branch 'master' of https://github.com/pandas-dev/pandas into pa…
Oct 8, 2021
8ebd77e
Removed backslash characters from doctests as per code review
Oct 11, 2021
8baf7bf
As per code review: replaced all "tz-" with "timezone-"
Oct 11, 2021
bc26945
Code review: capitalized if
Oct 11, 2021
83ef850
Compressed output description as per code review.
Oct 28, 2021
0b21772
Moved the general summary to a notes section
Oct 28, 2021
83ddfe7
Update pandas/core/tools/datetimes.py
smarie Oct 28, 2021
8e1ebf0
As per code review: reduced the utc param description and added struc…
Dec 17, 2021
d8cbe8a
Merge branch 'master' of https://github.com/pandas-dev/pandas into pa…
Dec 17, 2021
5310779
Minor edits
Dec 17, 2021
2b22544
Changed as per code review
Dec 17, 2021
2b63ea7
Changed as per code review
Dec 18, 2021
70a7c8f
what's new attempt
Dec 18, 2021
7739e12
Revert "what's new attempt"
Jan 3, 2022
4e87e39
Merge branch 'master' of https://github.com/pandas-dev/pandas into pa…
Jan 3, 2022
de9fe69
Merge branch 'master' of https://github.com/pandas-dev/pandas into pa…
Jan 4, 2022
5f4dbb8
Changed as per code review: added sphinx directives wherever possible…
Jan 4, 2022
a2fb1a1
Changed as per code review: added const role
Jan 4, 2022
04312bd
sphinx role
Jan 4, 2022
514b0c4
Changed as per code review: sphinx roles
Jan 4, 2022
fc2395d
minor change again
Jan 4, 2022
e0cf329
Last polishing round: sphinx roles and a few fixes
Jan 4, 2022
1421830
Fixed typo
Jan 4, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
177 changes: 157 additions & 20 deletions pandas/core/tools/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -689,6 +689,47 @@ def to_datetime(
"""
Convert argument to datetime.

This function converts a scalar, array-like, :class:`Series` or
:class:`DataFrame`/dict-like to a pandas datetime object.

- scalars can be int, float, str, datetime object (from stdlib datetime
module or numpy). They are converted to :class:`Timestamp` when possible,
otherwise they are converted to ``datetime.datetime``. None/NaN/null
scalars are converted to ``NaT``.

- array-like can contain int, float, str, datetime objects. They are
converted to :class:`DatetimeIndex` when possible, otherwise they are
converted to :class:`Index` with object dtype, containing
``datetime.datetime``. None/NaN/null entries are converted to ``NaT`` in
both cases.

- :class:`Series` are converted to :class:`Series` with datetime64 dtype
when possible, otherwise they are converted to :class:`Series` with
object dtype, containing ``datetime.datetime``. None/NaN/null entries
are converted to ``NaT`` in both cases.

- :class:`DataFrame`/dict-like are converted to :class:`Series` with
datetime64 dtype. For each row a datetime is created from assembling
the various dataframe columns. Column keys can be common abbreviations
like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or
plurals of the same.

The following causes are responsible for datetime.datetime objects being
returned (possibly inside an Index or a Series with object dtype) instead
of a proper pandas designated type (Timestamp, DatetimeIndex or Series
with datetime64 dtype):

- when any input element is before Timestamp.min or after Timestamp.max,
see `timestamp limitations
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
#timeseries-timestamp-limits>`_.

- when utc=False (default) and the input is an array-like or Series
containing mixed naive/aware datetime, or aware with mixed time offsets.
Note that this happens in the (quite frequent) situation when the
timezone has a daylight savings policy. In that case you may wish to
use utc=True.

Parameters
----------
arg : int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like
Expand Down Expand Up @@ -723,13 +764,40 @@ def to_datetime(
with year first.

utc : bool, default None
Return UTC DatetimeIndex if True (converting any tz-aware
datetime.datetime objects as well).
Control timezone-related parsing, localization and conversion.

- if True, returns a timezone-aware UTC-localized Timestamp, Series or
DatetimeIndex. Any tz-naive element will be *localized* as UTC.
Any already tz-aware input element (e.g. timezone-aware
datetime.datetime object, or datetime string with explicit timezone
offset) will be *converted* to UTC.

- If False (default), for scalar inputs, the result will be a
timezone-aware Timestamp if the scalar is timezone-aware, otherwise
it will be a timezone-naive Timestamp.
For multiple inputs (list, series):

- Tz-aware datetime.datetime inputs are not supported (raise
ValueError).
- The result will be a timezone-aware Series or DatetimeIndex
ONLY if all time offsets in string datetime inputs are
identical.
- If all inputs are timezone-naive, the result will be
timezone-naive.
- In other cases, for example if the time offset is
not identical in all string entries, the result will be an Index
of dtype object.

See pandas general documentation about `timezone conversion and
localization
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
#time-zone-handling>`_.

format : str, default None
The strftime to parse time, eg "%d/%m/%Y", note that "%f" will parse
all the way up to nanoseconds.
See strftime documentation for more information on choices:
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior.
all the way up to nanoseconds. See `strftime documentation
<https://docs.python.org/3/library/datetime.html
#strftime-and-strptime-behavior>`_ for more information on choices.
exact : bool, True by default
Behaves as:
- If True, require an exact format match.
Expand Down Expand Up @@ -771,16 +839,25 @@ def to_datetime(
If parsing succeeded.
Return type depends on input:

- list-like:
- DatetimeIndex, if timezone naive or aware with the same timezone
- Index of object dtype, if timezone aware with mixed time offsets
- Series: Series of datetime64 dtype
- scalar: Timestamp

In case when it is not possible to return designated types (e.g. when
any element of input is before Timestamp.min or after Timestamp.max)
return will have datetime.datetime type (or corresponding
array/Series).
- array-like: DatetimeIndex
- Series or DataFrame: Series of datetime64 dtype

Note: in some situations the return type can not be one of the above
and is rather datetime.datetime (scalar input) or Series with object
dtype containing datetime.datetime objects (array-like or Series
input). See above documentation for details, as well as examples
below.

Raises
------
ParserError
When parsing a date from string fails.
ValueError
When another datetime conversion error happens. For example when one
of 'year', 'month', day' is missing in a :class:`DataFrame`, or when
a Tz-aware datetime.datetime is found in an array-like of mixed time
offsets, and utc=False.

See Also
--------
Expand Down Expand Up @@ -850,16 +927,76 @@ def to_datetime(
DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'],
dtype='datetime64[ns]', freq=None)

In case input is list-like and the elements of input are of mixed
timezones, return will have object type Index if utc=False.
.. warning:: By default (utc=False), all items in an input array must
either be all tz-naive, or all tz-aware with the same offset. Mixed
offsets result in datetime.datetime objects being returned instead,
see examples below.

Default (utc=False) and tz-naive returns tz-naive DatetimeIndex:

>>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 13:00:15'])
DatetimeIndex(['2018-10-26 12:00:00', '2018-10-26 13:00:15'], \
dtype='datetime64[ns]', freq=None)

Default (utc=False) and tz-aware with constant offset returns tz-aware
DatetimeIndex:

>>> pd.to_datetime(['2018-10-26 12:00 -0500', '2018-10-26 13:00 -0500'])
DatetimeIndex(['2018-10-26 12:00:00-05:00', '2018-10-26 13:00:00-05:00'], \
dtype='datetime64[ns, pytz.FixedOffset(-300)]', freq=None)

Default (utc=False) and tz-aware with mixed offsets (for example from a
timezone with daylight savings) returns a simple Index containing
datetime.datetime objects:

>>> pd.to_datetime(['2020-10-25 02:00 +0200', '2020-10-25 04:00 +0100'])
Index([2020-10-25 02:00:00+02:00, 2020-10-25 04:00:00+01:00], \
dtype='object')

>>> pd.to_datetime(['2018-10-26 12:00 -0530', '2018-10-26 12:00 -0500'])
Index([2018-10-26 12:00:00-05:30, 2018-10-26 12:00:00-05:00], dtype='object')
Default (utc=False) and a mix of tz-aware and tz-naive returns a tz-aware
DatetimeIndex if the tz-naive are datetime...

>>> from datetime import datetime
>>> pd.to_datetime(["2020-01-01 01:00 -01:00", datetime(2020, 1, 1, 3, 0)])
DatetimeIndex(['2020-01-01 01:00:00-01:00', '2020-01-01 02:00:00-01:00'], \
dtype='datetime64[ns, pytz.FixedOffset(-60)]', freq=None)

...but does not if the tz-naive are strings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clean up this prose?


>>> pd.to_datetime(["2020-01-01 01:00 -01:00", "2020-01-01 03:00"])
Index([2020-01-01 01:00:00-01:00, 2020-01-01 03:00:00], dtype='object')

Special case: mixing tz-aware string and datetime fails when utc=False,
even if they have the same time offset.

>>> from datetime import datetime, timezone, timedelta
>>> d = datetime(2020, 1, 1, 18, tzinfo=timezone(-timedelta(hours=1)))
>>> d
datetime.datetime(2020, 1, 1, 18, 0, \
tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=82800)))
>>> pd.to_datetime(["2020-01-01 17:00 -0100", d])
Traceback (most recent call last):
...
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 \
unless utc=True

Setting utc=True solves most of the above issues, as tz-naive elements
will be localized to UTC, while tz-aware ones will simply be converted to
UTC (exact same datetime, but represented differently):

>>> pd.to_datetime(['2018-10-26 12:00 -0530', '2018-10-26 12:00 -0500'],
... utc=True)
DatetimeIndex(['2018-10-26 17:30:00+00:00', '2018-10-26 17:00:00+00:00'],
dtype='datetime64[ns, UTC]', freq=None)
DatetimeIndex(['2018-10-26 17:30:00+00:00', '2018-10-26 17:00:00+00:00'], \
dtype='datetime64[ns, UTC]', freq=None)

>>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 12:00 -0530',
... datetime(2020, 1, 1, 18),
... datetime(2020, 1, 1, 18,
... tzinfo=timezone(-timedelta(hours=1)))],
... utc=True)
DatetimeIndex(['2018-10-26 12:00:00+00:00', '2018-10-26 17:30:00+00:00', \
'2020-01-01 18:00:00+00:00', '2020-01-01 19:00:00+00:00'], \
dtype='datetime64[ns, UTC]', freq=None)
"""
if arg is None:
return None
Expand Down