Skip to content

Enable compression for pandas tables #7

@GWW

Description

@GWW

Hi,

I have noticed that compression is not enabled on pandas data frames when storing them with flammkuchen.

I have included a test example below where I store a pandas dataframe and a numpy array. The numpy array ends up compressed as per ddls while the pandas array data is not:

import numpy as np
import pandas as pd
import flammkuchen as fl
df = pd.DataFrame({'a':['B'] * 100000, 'b':np.repeat(1, 100000), 'c':np.repeat(1, 100000)})

fl.save('test.h5', {'df':df, 'npa':np.repeat(1, 100000)})
ddls -c --raw test.h5

/df                       dict
/df/axis0                 array (3,) [bytes8] none
/df/axis0_variety         'regular' (7) [unicode]
/df/axis1                 array (100000,) [int64] none
/df/axis1_variety         'regular' (7) [unicode]
/df/block0_items          array (2,) [bytes8] none
/df/block0_items_variety  'regular' (7) [unicode]
/df/block0_values         array (100000, 2) [int64] none
/df/block1_items          array (1,) [bytes8] none
/df/block1_items_variety  'regular' (7) [unicode]
/df/block1_values         pickled [object]
/df/encoding              'UTF-8' (5) [unicode]
/df/errors                'strict' (6) [unicode]
/df/nblocks               2 [int64]
/df/ndim                  2 [int64]
/df/pandas_type           'frame' (5) [unicode]
/df/pandas_version        '0.15.2' (6) [unicode]
/npa                      array (100000,) [int64] zlib lvl9

I believe this is an issue with

    class _HDFStoreWithHandle(pd.io.pytables.HDFStore):
        def __init__(self, handle):
            self._path = None
            self._complevel = None
            self._complib = None
            self._fletcher32 = False
            self._filters = None

            self._handle = handle

I think pandas does not respect the handles compression settings and having the complevel and complib set to None disables compression as per the pandas documentation. I am not sure the best way to extract the compression settings from the handle and apply it to this class.

Thanks in advance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions