Skip to content

provide rebuild option in appending to an HDFStore #824

@jreback

Description

@jreback

currently if you try to append to an existing table in an HDFStore and the fields are different from the existing fields,
pandas raises an Exception("append items do not match existing")

the reason for this the current structure of an HDFStore table is: index (time column), column (a string), values (a float array)
the values are in the order of the fields propertery in _v_attrs; so appending a different order (or different columns) invalidates the read-back mechanism, so we don't allow this

rebuilding is simple: read in the existing idea, concatenate with new data (which automatically reindexes), rename the existing file, write back the data file (could also write to a new file, then rename existing file, and rename new file to original file name - more atomic this way)

here is code that does this externally to HDFStore:

self.store is the HDFStore object (open,close methods are pretty simple so didn't show them)

   def append(self, data, is_verbose = False, is_rebuild = True):
            """ store a data object to a file """
            if data is not None:
                    if not self.open(is_writer = True): return None
                    try:
                            self.store.append(self.label,data)
                    except (Exception), detail:
                            if is_rebuild and str(detail).startswith("appended items do not match existing items"):
                                    self.rebuild(data)
                            else:
                                    raise
                    finally:
                            self.close()
                            if self.is_verbose or is_verbose:
                                    self.show_pretty('append', data = data)

    def rebuild(self, data):
            """ append to existiing data with a rebuild """
            self.lo("rebuilding datafile -> [file->%s]" % self.file)

            try:
                    # read it in, index and append
                    self.close()
                    new_data = apandas.Panel.append_many([self.select(), data])

                    # rename it
                    import os
                    os.rename(self.file, self.file + '.hdf_append_bak')

                    # write again
                    self.open(is_writer = True)
                    self.store.append(self.label,new_data)

                    self.lo("successful in append adjustment-> [file->%s]" % self.file)

            except (Exception), detail:
                    self.lo("error in rebuild -> [file->%s] %s -> bailing out!" % (self.file,detail))
                    raise

as you can see - right now I have to catch and match a specific error, so this ideally would be done internally in HDFStore._write_table

one thing I don't handle is recompressing the original file to the filters that are currently present

Jeff

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions