Skip to content

ERR: HDF5 serialization of datelike-object dtypes should raise #8887

@cowpig

Description

@cowpig

UPDATE:

In [157]: problem_date = old.dob.loc[4231354]

In [158]: problem_date
Out[158]: datetime.date(2939, 6, 2)

In [159]: test_series = pd.Series([problem_date])

In [160]: pd.to_datetime(test_series)
Out[160]: 
0    2939-06-02
dtype: object

It seems this is the source of the problem. I think there may be other dates in my dataset that are breaking the to_datetime method

UPDATE 2:

It seems that maybe it's that the date is later than 2900 that's causing the problem?

In [194]: pd.to_datetime(old.dob[8230866])                
Out[194]: datetime.date(2955, 8, 22)

In [195]: another_bad_date = old.dob.loc[8230866]         

In [196]: pd.to_datetime(pd.Series([another_bad_date]))
Out[196]: 
0    2955-08-22
dtype: object

original issue:

The column in question came from a read_sql query, and the column has datetimes. It consists solely of pandas datetime objects and NoneType objects. I have iterated over the Series to be sure. The column has 11 million rows.

I've tried casting with to_datetime (and the dtype remains object--shouldn't the dtype change after that call?), to no avail.

Here's some stuff I get from poking around after sticking an import pdb; pdb.set_trace() into line 3329 of pytables.py (after except (NotImplementedError, ValueError, TypeError) as e:):

(Pdb) b

(Pdb) i

3

(Pdb) blocks[3]

ObjectBlock: [1, 2, 3, 4, 9, 12, 13, 14], 8 x 8255524, dtype: object

(Pdb) blk_items[3]

Index([u'dob', u'City', u'Region', u'Zip', u'lang', u'UnsubscribedDate', u'BadAddressDate', u'ISP'], dtype='object')

(Pdb) existing_col

(Pdb) col

name->values_block_3,cname->values_block_3,dtype->None,shape->None

(Pdb) b

(Pdb) type(b)

<class 'pandas.core.internals.ObjectBlock'>

(Pdb) block_items

*** NameError: name 'block_items' is not defined

(Pdb) b_items

Index([u'dob', u'City', u'Region', u'Zip', u'lang', u'UnsubscribedDate', u'BadAddressDate', u'ISP'], dtype='object')

(Pdb) existing_col

(Pdb) e

TypeError('Cannot serialize the column [dob] because\nits data contents are [mixed] object dtype',)

(Pdb) type(col)

<class 'pandas.io.pytables.DataCol'>

(Pdb) lib

<module 'pandas.lib' from '/home/mmccrea/anaconda/lib/python2.7/site-packages/pandas/lib.so'>

My debugging kinds of hits a wall here, because it seems infer_dtype seems to be throwing the error, which is in lib.so, which is a compiled binary and I'm not sure how to look into that to figure out what's going on. I would love a suggestion about how to deal with that in the future, in addition to some answers about what's going on in this case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Closing CandidateMay be closeable, needs more eyeballsDatetimeDatetime data dtypeEnhancementError ReportingIncorrect or improved errors from pandasIO HDF5read_hdf, HDFStoreNon-Nanodatetime64/timedelta64 with non-nanosecond resolution

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions