Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion pandas/io/sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,11 @@ def _handle_date_column(col, utc=None, format=None):
issubclass(col.dtype.type, np.integer)):
# parse dates as timestamp
format = 's' if format is None else format
return to_datetime(col, errors='coerce', unit=format, utc=utc)
if '%' in format:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment as to why you are doing this logic branching (and reference issue number).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second thought, I think we can do this a bit cleaner like this:

if format is None and (issubclass(col.dtype.type, np.floating) or
        issubclass(col.dtype.type, np.integer)):
    format = 's'

if format in ['D', 'd', 'h', 'm' 's', 'ms', 'us', 'ns']:
 return to_datetime(col, errors='coerce', unit=format, utc=utc)
elif is_datetime64tz_dtype(col):
    ...
else:
    return to_datetime(col, errors='coerce', format=format, utc=utc)

So first check for the specific case of numeric values and no format -> parse as seconds. Then the format arg is checked for all possible values for unit. Once this check is passed, we don't need to check if '%' is in format anymore, as it can never be a valid unit (this has already been checked)

Copy link
Contributor Author

@drorata drorata Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche But what about the case where the column consists of integers of the format YYYYMMDD or something similar? This is not a valid unit and has to be formatted using % (e.g. %Y%m%d).

If the format string contains % it means that the user knows something about the data and this knowledge has to be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you specify format="%Y%m%d", the column will be parsed with that format in the snippet above (only the specific recognized units specifiers are passed to unit, otherwise format is used)

return to_datetime(
col, errors='coerce', format=format, utc=utc)
else:
return to_datetime(col, errors='coerce', unit=format, utc=utc)
elif is_datetime64tz_dtype(col):
# coerce to UTC timezone
# GH11216
Expand Down