forked from uchicago-cs/deepdish
-
Notifications
You must be signed in to change notification settings - Fork 6
Enable compression for pandas tables #7
Copy link
Copy link
Open
Description
Hi,
I have noticed that compression is not enabled on pandas data frames when storing them with flammkuchen.
I have included a test example below where I store a pandas dataframe and a numpy array. The numpy array ends up compressed as per ddls while the pandas array data is not:
import numpy as np
import pandas as pd
import flammkuchen as fl
df = pd.DataFrame({'a':['B'] * 100000, 'b':np.repeat(1, 100000), 'c':np.repeat(1, 100000)})
fl.save('test.h5', {'df':df, 'npa':np.repeat(1, 100000)})
ddls -c --raw test.h5
/df dict
/df/axis0 array (3,) [bytes8] none
/df/axis0_variety 'regular' (7) [unicode]
/df/axis1 array (100000,) [int64] none
/df/axis1_variety 'regular' (7) [unicode]
/df/block0_items array (2,) [bytes8] none
/df/block0_items_variety 'regular' (7) [unicode]
/df/block0_values array (100000, 2) [int64] none
/df/block1_items array (1,) [bytes8] none
/df/block1_items_variety 'regular' (7) [unicode]
/df/block1_values pickled [object]
/df/encoding 'UTF-8' (5) [unicode]
/df/errors 'strict' (6) [unicode]
/df/nblocks 2 [int64]
/df/ndim 2 [int64]
/df/pandas_type 'frame' (5) [unicode]
/df/pandas_version '0.15.2' (6) [unicode]
/npa array (100000,) [int64] zlib lvl9
I believe this is an issue with
class _HDFStoreWithHandle(pd.io.pytables.HDFStore):
def __init__(self, handle):
self._path = None
self._complevel = None
self._complib = None
self._fletcher32 = False
self._filters = None
self._handle = handle
I think pandas does not respect the handles compression settings and having the complevel and complib set to None disables compression as per the pandas documentation. I am not sure the best way to extract the compression settings from the handle and apply it to this class.
Thanks in advance
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels