Skip to content

Loading of large pickled dataframes fails #2705

@daggre-gmu

Description

@daggre-gmu

I tried pickling a very large dataframe (20GB or so) and that succeeded to write to disk, but when I try to read it, it fails with: ValueError: buffer size does not match array size

Now I did a bit of research and found the following:

http://stackoverflow.com/questions/12060932/unable-to-load-a-previously-dumped-pickle-file-of-large-size-in-python

http://bugs.python.org/issue13555

I am thinking this is a numpy/python issue, but it does cause me pretty big pain when I want to back up a dataframe that took a long time to join together, and I want all the dtypes stored (namely what columns are datetimes). Perhaps a solution would be a csv file that keeps the dtypes somewhere (otherwise I'll have to figure out what columns are serialized dates). Any workarounds would be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO DataIO issues that don't fit into a more specific label

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions