-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
UPDATE:
In [157]: problem_date = old.dob.loc[4231354]
In [158]: problem_date
Out[158]: datetime.date(2939, 6, 2)
In [159]: test_series = pd.Series([problem_date])
In [160]: pd.to_datetime(test_series)
Out[160]:
0 2939-06-02
dtype: object
It seems this is the source of the problem. I think there may be other dates in my dataset that are breaking the to_datetime
method
UPDATE 2:
It seems that maybe it's that the date is later than 2900 that's causing the problem?
In [194]: pd.to_datetime(old.dob[8230866])
Out[194]: datetime.date(2955, 8, 22)
In [195]: another_bad_date = old.dob.loc[8230866]
In [196]: pd.to_datetime(pd.Series([another_bad_date]))
Out[196]:
0 2955-08-22
dtype: object
original issue:
The column in question came from a read_sql query, and the column has datetimes. It consists solely of pandas datetime objects and NoneType objects. I have iterated over the Series to be sure. The column has 11 million rows.
I've tried casting with to_datetime (and the dtype remains object--shouldn't the dtype change after that call?), to no avail.
Here's some stuff I get from poking around after sticking an import pdb; pdb.set_trace()
into line 3329 of pytables.py (after except (NotImplementedError, ValueError, TypeError) as e:
):
(Pdb) b
(Pdb) i
3
(Pdb) blocks[3]
ObjectBlock: [1, 2, 3, 4, 9, 12, 13, 14], 8 x 8255524, dtype: object
(Pdb) blk_items[3]
Index([u'dob', u'City', u'Region', u'Zip', u'lang', u'UnsubscribedDate', u'BadAddressDate', u'ISP'], dtype='object')
(Pdb) existing_col
(Pdb) col
name->values_block_3,cname->values_block_3,dtype->None,shape->None
(Pdb) b
(Pdb) type(b)
<class 'pandas.core.internals.ObjectBlock'>
(Pdb) block_items
*** NameError: name 'block_items' is not defined
(Pdb) b_items
Index([u'dob', u'City', u'Region', u'Zip', u'lang', u'UnsubscribedDate', u'BadAddressDate', u'ISP'], dtype='object')
(Pdb) existing_col
(Pdb) e
TypeError('Cannot serialize the column [dob] because\nits data contents are [mixed] object dtype',)
(Pdb) type(col)
<class 'pandas.io.pytables.DataCol'>
(Pdb) lib
<module 'pandas.lib' from '/home/mmccrea/anaconda/lib/python2.7/site-packages/pandas/lib.so'>
My debugging kinds of hits a wall here, because it seems infer_dtype
seems to be throwing the error, which is in lib.so, which is a compiled binary and I'm not sure how to look into that to figure out what's going on. I would love a suggestion about how to deal with that in the future, in addition to some answers about what's going on in this case.