-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Improved docstring and return type hints for to_datetime
#42494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 10 commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
bd4061c
Update datetimes.py
smarie 66c725d
from code review: improved utc doc
74f6aa2
Merge branch 'master' of https://github.com/pandas-dev/pandas into pa…
f5cbef8
Improved overall readability by
866bdcb
minor improvement
cd3ec35
Minor fix and improvement again
1be053a
Changed order of output description to match the global section doc
41a1e53
Removed the "type: ignore" since the return type hints are now fixed
95cfc54
Removed type hint-related mods (will move to a separate pr)
6569b1e
Merge branch 'master' of https://github.com/pandas-dev/pandas into pa…
8ebd77e
Removed backslash characters from doctests as per code review
8baf7bf
As per code review: replaced all "tz-" with "timezone-"
bc26945
Code review: capitalized if
83ef850
Compressed output description as per code review.
0b21772
Moved the general summary to a notes section
83ddfe7
Update pandas/core/tools/datetimes.py
smarie 8e1ebf0
As per code review: reduced the utc param description and added struc…
d8cbe8a
Merge branch 'master' of https://github.com/pandas-dev/pandas into pa…
5310779
Minor edits
2b22544
Changed as per code review
2b63ea7
Changed as per code review
70a7c8f
what's new attempt
7739e12
Revert "what's new attempt"
4e87e39
Merge branch 'master' of https://github.com/pandas-dev/pandas into pa…
de9fe69
Merge branch 'master' of https://github.com/pandas-dev/pandas into pa…
5f4dbb8
Changed as per code review: added sphinx directives wherever possible…
a2fb1a1
Changed as per code review: added const role
04312bd
sphinx role
514b0c4
Changed as per code review: sphinx roles
fc2395d
minor change again
e0cf329
Last polishing round: sphinx roles and a few fixes
1421830
Fixed typo
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -689,6 +689,47 @@ def to_datetime( | |
""" | ||
Convert argument to datetime. | ||
|
||
This function converts a scalar, array-like, :class:`Series` or | ||
:class:`DataFrame`/dict-like to a pandas datetime object. | ||
|
||
- scalars can be int, float, str, datetime object (from stdlib datetime | ||
module or numpy). They are converted to :class:`Timestamp` when possible, | ||
otherwise they are converted to ``datetime.datetime``. None/NaN/null | ||
scalars are converted to ``NaT``. | ||
|
||
- array-like can contain int, float, str, datetime objects. They are | ||
converted to :class:`DatetimeIndex` when possible, otherwise they are | ||
converted to :class:`Index` with object dtype, containing | ||
``datetime.datetime``. None/NaN/null entries are converted to ``NaT`` in | ||
both cases. | ||
|
||
- :class:`Series` are converted to :class:`Series` with datetime64 dtype | ||
when possible, otherwise they are converted to :class:`Series` with | ||
object dtype, containing ``datetime.datetime``. None/NaN/null entries | ||
are converted to ``NaT`` in both cases. | ||
|
||
- :class:`DataFrame`/dict-like are converted to :class:`Series` with | ||
datetime64 dtype. For each row a datetime is created from assembling | ||
the various dataframe columns. Column keys can be common abbreviations | ||
like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or | ||
plurals of the same. | ||
|
||
The following causes are responsible for datetime.datetime objects being | ||
returned (possibly inside an Index or a Series with object dtype) instead | ||
of a proper pandas designated type (Timestamp, DatetimeIndex or Series | ||
with datetime64 dtype): | ||
|
||
- when any input element is before Timestamp.min or after Timestamp.max, | ||
see `timestamp limitations | ||
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html | ||
#timeseries-timestamp-limits>`_. | ||
|
||
- when utc=False (default) and the input is an array-like or Series | ||
containing mixed naive/aware datetime, or aware with mixed time offsets. | ||
Note that this happens in the (quite frequent) situation when the | ||
timezone has a daylight savings policy. In that case you may wish to | ||
use utc=True. | ||
|
||
Parameters | ||
---------- | ||
arg : int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like | ||
|
@@ -723,13 +764,40 @@ def to_datetime( | |
with year first. | ||
|
||
utc : bool, default None | ||
Return UTC DatetimeIndex if True (converting any tz-aware | ||
datetime.datetime objects as well). | ||
Control timezone-related parsing, localization and conversion. | ||
smarie marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- if True, returns a timezone-aware UTC-localized Timestamp, Series or | ||
smarie marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
DatetimeIndex. Any tz-naive element will be *localized* as UTC. | ||
Any already tz-aware input element (e.g. timezone-aware | ||
smarie marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
datetime.datetime object, or datetime string with explicit timezone | ||
offset) will be *converted* to UTC. | ||
|
||
- If False (default), for scalar inputs, the result will be a | ||
timezone-aware Timestamp if the scalar is timezone-aware, otherwise | ||
it will be a timezone-naive Timestamp. | ||
For multiple inputs (list, series): | ||
|
||
- Tz-aware datetime.datetime inputs are not supported (raise | ||
ValueError). | ||
- The result will be a timezone-aware Series or DatetimeIndex | ||
ONLY if all time offsets in string datetime inputs are | ||
identical. | ||
- If all inputs are timezone-naive, the result will be | ||
timezone-naive. | ||
- In other cases, for example if the time offset is | ||
not identical in all string entries, the result will be an Index | ||
of dtype object. | ||
|
||
See pandas general documentation about `timezone conversion and | ||
localization | ||
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html | ||
#time-zone-handling>`_. | ||
|
||
format : str, default None | ||
The strftime to parse time, eg "%d/%m/%Y", note that "%f" will parse | ||
all the way up to nanoseconds. | ||
See strftime documentation for more information on choices: | ||
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior. | ||
all the way up to nanoseconds. See `strftime documentation | ||
<https://docs.python.org/3/library/datetime.html | ||
#strftime-and-strptime-behavior>`_ for more information on choices. | ||
exact : bool, True by default | ||
Behaves as: | ||
- If True, require an exact format match. | ||
|
@@ -771,16 +839,25 @@ def to_datetime( | |
If parsing succeeded. | ||
Return type depends on input: | ||
|
||
- list-like: | ||
- DatetimeIndex, if timezone naive or aware with the same timezone | ||
- Index of object dtype, if timezone aware with mixed time offsets | ||
- Series: Series of datetime64 dtype | ||
- scalar: Timestamp | ||
|
||
In case when it is not possible to return designated types (e.g. when | ||
any element of input is before Timestamp.min or after Timestamp.max) | ||
return will have datetime.datetime type (or corresponding | ||
array/Series). | ||
- array-like: DatetimeIndex | ||
- Series or DataFrame: Series of datetime64 dtype | ||
|
||
Note: in some situations the return type can not be one of the above | ||
smarie marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
and is rather datetime.datetime (scalar input) or Series with object | ||
dtype containing datetime.datetime objects (array-like or Series | ||
input). See above documentation for details, as well as examples | ||
below. | ||
|
||
Raises | ||
------ | ||
ParserError | ||
When parsing a date from string fails. | ||
ValueError | ||
When another datetime conversion error happens. For example when one | ||
of 'year', 'month', day' is missing in a :class:`DataFrame`, or when | ||
a Tz-aware datetime.datetime is found in an array-like of mixed time | ||
offsets, and utc=False. | ||
|
||
See Also | ||
-------- | ||
|
@@ -850,16 +927,76 @@ def to_datetime( | |
DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'], | ||
dtype='datetime64[ns]', freq=None) | ||
|
||
In case input is list-like and the elements of input are of mixed | ||
timezones, return will have object type Index if utc=False. | ||
.. warning:: By default (utc=False), all items in an input array must | ||
either be all tz-naive, or all tz-aware with the same offset. Mixed | ||
offsets result in datetime.datetime objects being returned instead, | ||
see examples below. | ||
|
||
Default (utc=False) and tz-naive returns tz-naive DatetimeIndex: | ||
|
||
>>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 13:00:15']) | ||
DatetimeIndex(['2018-10-26 12:00:00', '2018-10-26 13:00:15'], \ | ||
smarie marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
dtype='datetime64[ns]', freq=None) | ||
|
||
Default (utc=False) and tz-aware with constant offset returns tz-aware | ||
DatetimeIndex: | ||
|
||
>>> pd.to_datetime(['2018-10-26 12:00 -0500', '2018-10-26 13:00 -0500']) | ||
DatetimeIndex(['2018-10-26 12:00:00-05:00', '2018-10-26 13:00:00-05:00'], \ | ||
dtype='datetime64[ns, pytz.FixedOffset(-300)]', freq=None) | ||
|
||
Default (utc=False) and tz-aware with mixed offsets (for example from a | ||
timezone with daylight savings) returns a simple Index containing | ||
datetime.datetime objects: | ||
|
||
>>> pd.to_datetime(['2020-10-25 02:00 +0200', '2020-10-25 04:00 +0100']) | ||
Index([2020-10-25 02:00:00+02:00, 2020-10-25 04:00:00+01:00], \ | ||
dtype='object') | ||
|
||
>>> pd.to_datetime(['2018-10-26 12:00 -0530', '2018-10-26 12:00 -0500']) | ||
Index([2018-10-26 12:00:00-05:30, 2018-10-26 12:00:00-05:00], dtype='object') | ||
Default (utc=False) and a mix of tz-aware and tz-naive returns a tz-aware | ||
DatetimeIndex if the tz-naive are datetime... | ||
|
||
>>> from datetime import datetime | ||
>>> pd.to_datetime(["2020-01-01 01:00 -01:00", datetime(2020, 1, 1, 3, 0)]) | ||
DatetimeIndex(['2020-01-01 01:00:00-01:00', '2020-01-01 02:00:00-01:00'], \ | ||
dtype='datetime64[ns, pytz.FixedOffset(-60)]', freq=None) | ||
|
||
...but does not if the tz-naive are strings | ||
|
||
|
||
>>> pd.to_datetime(["2020-01-01 01:00 -01:00", "2020-01-01 03:00"]) | ||
smarie marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
Index([2020-01-01 01:00:00-01:00, 2020-01-01 03:00:00], dtype='object') | ||
|
||
Special case: mixing tz-aware string and datetime fails when utc=False, | ||
even if they have the same time offset. | ||
|
||
>>> from datetime import datetime, timezone, timedelta | ||
smarie marked this conversation as resolved.
Show resolved
Hide resolved
|
||
>>> d = datetime(2020, 1, 1, 18, tzinfo=timezone(-timedelta(hours=1))) | ||
>>> d | ||
datetime.datetime(2020, 1, 1, 18, 0, \ | ||
tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=82800))) | ||
>>> pd.to_datetime(["2020-01-01 17:00 -0100", d]) | ||
Traceback (most recent call last): | ||
... | ||
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 \ | ||
unless utc=True | ||
|
||
Setting utc=True solves most of the above issues, as tz-naive elements | ||
will be localized to UTC, while tz-aware ones will simply be converted to | ||
UTC (exact same datetime, but represented differently): | ||
|
||
>>> pd.to_datetime(['2018-10-26 12:00 -0530', '2018-10-26 12:00 -0500'], | ||
... utc=True) | ||
DatetimeIndex(['2018-10-26 17:30:00+00:00', '2018-10-26 17:00:00+00:00'], | ||
dtype='datetime64[ns, UTC]', freq=None) | ||
DatetimeIndex(['2018-10-26 17:30:00+00:00', '2018-10-26 17:00:00+00:00'], \ | ||
dtype='datetime64[ns, UTC]', freq=None) | ||
|
||
>>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 12:00 -0530', | ||
... datetime(2020, 1, 1, 18), | ||
... datetime(2020, 1, 1, 18, | ||
... tzinfo=timezone(-timedelta(hours=1)))], | ||
... utc=True) | ||
DatetimeIndex(['2018-10-26 12:00:00+00:00', '2018-10-26 17:30:00+00:00', \ | ||
'2020-01-01 18:00:00+00:00', '2020-01-01 19:00:00+00:00'], \ | ||
dtype='datetime64[ns, UTC]', freq=None) | ||
""" | ||
if arg is None: | ||
return None | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.