Skip to content

Conversation

jreadey
Copy link
Member

@jreadey jreadey commented Mar 25, 2025

hdf5db now supports interaces for reader and/or writer objects.
Currently json and hdf5 (using h5py) are supported.

Copy link

@mattjala mattjala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows 3.11/3.12 CI test failure:

FAIL: testSimple (__main__.H5pyWriterTest.testSimple)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\hdf5-json\hdf5-json\test\unit\h5py_writer_test.py", line 155, in testSimple
    self.assertEqual(len(g1.attrs), 1)
AssertionError: 2 != 1

# Copyright by The HDF Group. #
# All rights reserved. #
# #
# This file is part of H5Serv (HDF5 REST Server) Service, Libraries and #
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

H5Serv -> HSDS (or something else, if we want the reader/writer to be independent)

pass


class H5NullReader(H5Reader):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation to have this as a default reader instead of something like filesystem reading? If no reader is specified and all the reads are no-ops, I imagine the user would want to be notified about that as an error, not have all the operations seem to go well but not do anything - assuming I understand this correctly

# Copyright by The HDF Group. #
# All rights reserved. #
# #
# This file is part of H5Serv (HDF5 REST Server) Service, Libraries and #
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

H5Serv -> HSDS again - it looks like this occurs in a lot of files and can be fixed with a quick find and replace.

return self._reader

@reader.setter
def reader(self, value: H5Reader):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be preferable to distinguish the two reader() functions as get_reader() and set_reader(), rather than leaving it up to the arguments. Same for writer().

@property
def root_id(self):
""" return root uuid """
return self._root_id
Copy link

@mattjala mattjala Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is one hdf5db instance associated directly with a single top-level file? This seems a little unusual, but perhaps it makes sense since each file may need a unique reader/writer. Perhaps 'hdf5db' as a name for this object doesn't communicate it's exact purpose. Would something like "hdf5IOmanager" make more sense?

# storing db in the file itself, so we can link to the object directly
col[id] = obj.ref # save attribute ref to object
type_json = ctype_json["type"].copy()
type_json["id"] = ctype_id
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this step of setting the id necessary? It seems like the id field in the stored type json should reflect the constructed ctype id, since that id is what we used to find this json in the first place.

# assume attr_type is a uuid of a named datatype
is_committed_type = True
# TBD: if dtype is a committed ref type, fetch it first
# TBD: also, check special case for complex types
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these TBDs still need to be addressed?

raise KeyError(f"ctype: {ctype_id} not found")
ctype_json = self.getObjectById(ctype_id)
type_json = ctype_json["type"].copy()
type_json["id"] = ctype_id
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above about the necessity of setting this field.

raise IOError(errno.EINVAL, msg)
if "fillValue" in cpl:
fillValue = cpl["fillValue"]
# TBD: fix for compound types
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this TBD still need to be addressed?

stop = start + sel_inter.count[dim]
slices.append(slice(start, stop, 1))
slices = tuple(slices)
# TBD: needs updating to work in the general case!
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this TBD been addressed?

raise IOError(errno.EINVAL, msg)
return link_json

def _addLink(self, grp_id, name, link_json):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If _addLink is meant to be an internal function, should it be moved to utils?


for obj_id in self.db:
# skip deleted objects
if self.db[obj_id] is not None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it will still count objects that have been marked as deleted with "DELETED" and marked dirty, since e.g. deleteLink doesn't appear to remove the link from the database.

try:
linkObj = parent.get(link_name, None, False, True)
linkClass = linkObj.__class__.__name__
except TypeError:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no way to distinguish a UD link from other potential causes of TypeError here and in _getLink?

self._flush_time = 0.0
self._f = None # h5py file handle

def _copy_element(self, val, src_dt, tgt_dt, fout=None):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the _copy_* routines here and in h5py_reader should be renamed to indicate that they're also performing a conversion, and the direction of that conversion. Something like _element_json_to_h5py_rep()/_element_h5py_to_json_rep() would prevent confusion between the two identically named functions that perform opposite operations.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same goes for _copy_array.

for title in titles:
link_json = links_json[title]
link_class = link_json["class"]
if "DELETED" in link_json:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this also handles deletion, would it be more accurate to call _createObjects() something like _syncObjects()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants