Skip to content

BUG: to_datetime drops decimal point and fractional seconds under certain conditions #55321

@DaveSprague

Description

@DaveSprague

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

When a time string has a whole number of seconds (the fractional part is zero) followed by a timezone specification such as "+00:00", for example,

"2021-09-17 22:07:15.000000+00:00",

the to_datetime function drops the decimal point and the fraction zeros which results in a DateTime object like this:

"2021-09-17 22:07:15+00:00",

If this DateTime object is then exported to a csv file and read back in, a subsequent to_datetime_ function will crash with an ValueError like this one:

ValueError: time data "2021-09-17 22:07:15+00:00" doesn't match format "%Y-%m-%d %H:%M:%S.%f%z"

Reproducible Example

Code:

import pandas as pd

df = pd.DataFrame({"value": range(3), "DateTime": [
    '2021-09-17 22:07:14.820200+00:00',
    '2021-09-17 22:07:15.000000+00:00',
    '2021-09-17 22:07:15.179962+00:00',
]})

print(df)

df["convertedDateTime"] = pd.to_datetime(df.DateTime)

print(df)

df.to_csv("test_to_datetime.csv")

df_in = pd.read_csv("test_to_datetime.csv")

df_in["twice_converted_datetime"] = pd.to_datetime(df_in["convertedDateTime"])

print(df_in)


Output:
'''
   value                          DateTime
0      0  2021-09-17 22:07:14.820200+00:00
1      1  2021-09-17 22:07:15.000000+00:00
2      2  2021-09-17 22:07:15.179962+00:00
   value                          DateTime                convertedDateTime
0      0  2021-09-17 22:07:14.820200+00:00 2021-09-17 22:07:14.820200+00:00
1      1  2021-09-17 22:07:15.000000+00:00        2021-09-17 22:07:15+00:00
2      2  2021-09-17 22:07:15.179962+00:00 2021-09-17 22:07:15.179962+00:00
Traceback (most recent call last):
  File "test_to_datetime.py", line 28, in <module>
    df_in["twice_converted_datetime"] = pd.to_datetime(df_in["convertedDateTime"])
  File "C:\Users\David\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 1112, in to_datetime
    values = convert_listlike(arg._values, format)
  File "C:\Users\David\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 488, in _convert_listlike_datetimes
    return _array_strptime_with_fallback(arg, name, utc, format, exact, errors)
  File "C:\Users\David\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 519, in _array_strptime_with_fallback
    result, timezones = array_strptime(arg, fmt, exact=exact, errors=errors, utc=utc)
  File "strptime.pyx", line 534, in pandas._libs.tslibs.strptime.array_strptime
  File "strptime.pyx", line 355, in pandas._libs.tslibs.strptime.array_strptime
ValueError: time data "2021-09-17 22:07:15+00:00" doesn't match format "%Y-%m-%d %H:%M:%S.%f%z", at position 1. You might want to try: 
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.
'''

Issue Description

``When a column that represents datetime as a string has an entry like this one:
'2021-09-17 22:07:15.000000+00:00'
where the fractional seconds are all zeros,
the pandas to_datetime function will convert it to:
2021-09-17 22:07:15+00:00

which drops the decimal point and fractional part of the number of seconds in the datetime representation.

If you write that converted dataframe to a csv file and read it back in again,
the to_datetime function will fail because

"2021-09-17 22:07:15+00:00" doesn't match format "%Y-%m-%d %H:%M:%S.%f%z"

Expected Behavior

if you remove the timezone coding at the end of the original strings (i.e. remove the "+00:00", then the above example code works correctly with the following output:

value DateTime
0 0 2021-09-17 22:07:14.820200
1 1 2021-09-17 22:07:15.000000
2 2 2021-09-17 22:07:15.179962
value DateTime convertedDateTime
0 0 2021-09-17 22:07:14.820200 2021-09-17 22:07:14.820200
1 1 2021-09-17 22:07:15.000000 2021-09-17 22:07:15.000000
2 2 2021-09-17 22:07:15.179962 2021-09-17 22:07:15.179962
Unnamed: 0 value DateTime convertedDateTime twice_converted_datetime
0 0 0 2021-09-17 22:07:14.820200 2021-09-17 22:07:14.820200 2021-09-17 22:07:14.820200
1 1 1 2021-09-17 22:07:15.000000 2021-09-17 22:07:15.000000 2021-09-17 22:07:15.000000
2 2 2 2021-09-17 22:07:15.179962 2021-09-17 22:07:15.179962 2021-09-17 22:07:15.179962

But the to_datetime function should work properly even when the original strings contain the "+00:00" suffix

Installed Versions

INSTALLED VERSIONS

commit : e86ed37
python : 3.10.12.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22621
machine : AMD64
processor : Intel64 Family 6 Model 167 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : English_United States.1252

pandas : 2.1.1
numpy : 1.26.0
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.2.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.15.0
pandas_datareader : None
bs4 : None
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions