Abstract reader/writer design #80

jreadey · 2025-03-25T17:54:01Z

hdf5db now supports interaces for reader and/or writer objects.
Currently json and hdf5 (using h5py) are supported.

mattjala

Windows 3.11/3.12 CI test failure:

FAIL: testSimple (__main__.H5pyWriterTest.testSimple)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\hdf5-json\hdf5-json\test\unit\h5py_writer_test.py", line 155, in testSimple
    self.assertEqual(len(g1.attrs), 1)
AssertionError: 2 != 1

mattjala · 2025-10-03T17:57:37Z

src/h5json/h5reader.py

+# Copyright by The HDF Group.                                                #
+# All rights reserved.                                                       #
+#                                                                            #
+# This file is part of H5Serv (HDF5 REST Server) Service, Libraries and      #


H5Serv -> HSDS (or something else, if we want the reader/writer to be independent)

mattjala · 2025-10-03T18:02:35Z

src/h5json/h5reader.py

+        pass
+
+
+class H5NullReader(H5Reader):


What's the motivation to have this as a default reader instead of something like filesystem reading? If no reader is specified and all the reads are no-ops, I imagine the user would want to be notified about that as an error, not have all the operations seem to go well but not do anything - assuming I understand this correctly

mattjala · 2025-10-03T18:03:23Z

src/h5json/h5writer.py

+# Copyright by The HDF Group.                                                #
+# All rights reserved.                                                       #
+#                                                                            #
+# This file is part of H5Serv (HDF5 REST Server) Service, Libraries and      #


H5Serv -> HSDS again - it looks like this occurs in a lot of files and can be fixed with a quick find and replace.

mattjala · 2025-10-03T18:05:33Z

src/h5json/hdf5db.py

+        return self._reader
+
+    @reader.setter
+    def reader(self, value: H5Reader):


I think it would be preferable to distinguish the two reader() functions as get_reader() and set_reader(), rather than leaving it up to the arguments. Same for writer().

mattjala · 2025-10-03T18:08:23Z

src/h5json/hdf5db.py

+    @property
+    def root_id(self):
+        """ return root uuid """
+        return self._root_id


Is one hdf5db instance associated directly with a single top-level file? This seems a little unusual, but perhaps it makes sense since each file may need a unique reader/writer. Perhaps 'hdf5db' as a name for this object doesn't communicate it's exact purpose. Would something like "hdf5IOmanager" make more sense?

mattjala · 2025-10-03T18:20:43Z

src/h5json/hdf5db.py

-            # storing db in the file itself, so we can link to the object directly
-            col[id] = obj.ref  # save attribute ref to object
+            type_json = ctype_json["type"].copy()
+            type_json["id"] = ctype_id


Why is this step of setting the id necessary? It seems like the id field in the stored type json should reflect the constructed ctype id, since that id is what we used to find this json in the first place.

mattjala · 2025-10-03T18:22:02Z

src/h5json/hdf5db.py

-            # assume attr_type is a uuid of a named datatype
-            is_committed_type = True
+        # TBD: if dtype is a committed ref type, fetch it first
+        # TBD: also, check special case for complex types


Do these TBDs still need to be addressed?

mattjala · 2025-10-03T18:23:00Z

src/h5json/hdf5db.py

+                raise KeyError(f"ctype: {ctype_id} not found")
+            ctype_json = self.getObjectById(ctype_id)
+            type_json = ctype_json["type"].copy()
+            type_json["id"] = ctype_id


Same question as above about the necessity of setting this field.

mattjala · 2025-10-03T18:27:20Z

src/h5json/hdf5db.py

-                raise IOError(errno.EINVAL, msg)
+            if "fillValue" in cpl:
+                fillValue = cpl["fillValue"]
+                # TBD: fix for compound types


Does this TBD still need to be addressed?

mattjala · 2025-10-03T18:37:39Z

src/h5json/hdf5db.py

+                    stop = start + sel_inter.count[dim]
+                    slices.append(slice(start, stop, 1))
+                slices = tuple(slices)
+                # TBD: needs updating to work in the general case!


Has this TBD been addressed?

mattjala · 2025-10-03T18:39:59Z

src/h5json/hdf5db.py

-                raise IOError(errno.EINVAL, msg)
+        return link_json
+
+    def _addLink(self, grp_id, name, link_json):


If _addLink is meant to be an internal function, should it be moved to utils?

mattjala · 2025-10-03T18:43:09Z

src/h5json/hdf5db.py

-
+        for obj_id in self.db:
+            # skip deleted objects
+            if self.db[obj_id] is not None:


This looks like it will still count objects that have been marked as deleted with "DELETED" and marked dirty, since e.g. deleteLink doesn't appear to remove the link from the database.

mattjala · 2025-10-03T19:03:03Z

src/h5json/h5pystore/h5py_reader.py

+            try:
+                linkObj = parent.get(link_name, None, False, True)
+                linkClass = linkObj.__class__.__name__
+            except TypeError:


Is there no way to distinguish a UD link from other potential causes of TypeError here and in _getLink?

mattjala · 2025-10-03T19:12:49Z

src/h5json/h5pystore/h5py_writer.py

+        self._flush_time = 0.0
+        self._f = None  # h5py file handle
+
+    def _copy_element(self, val, src_dt, tgt_dt, fout=None):


I think the _copy_* routines here and in h5py_reader should be renamed to indicate that they're also performing a conversion, and the direction of that conversion. Something like _element_json_to_h5py_rep()/_element_h5py_to_json_rep() would prevent confusion between the two identically named functions that perform opposite operations.

The same goes for _copy_array.

mattjala · 2025-10-03T19:18:14Z

src/h5json/h5pystore/h5py_writer.py

+        for title in titles:
+            link_json = links_json[title]
+            link_class = link_json["class"]
+            if "DELETED" in link_json:


Given that this also handles deletion, would it be more accurate to call _createObjects() something like _syncObjects()?

jreadey added 30 commits February 4, 2025 22:48

added objid functions

be22083

fix flake8 errors

28dcfc6

merge hsds hdf5dtype changes

54c83d5

patch flake8 error

3a2b084

patch flake8 error

133e962

keep backward compatibility for enum members key

856ee65

first pass at abstrct db class

eec4efc

first pass at h5py reader

2f546b9

added h5json_writer

4b9cb68

create reader and writer packages

bad4012

basic dataset read/write methods added

c5c28a4

update h5tojson script

c0a6cc3

added h5json read

48d43e4

added h5py writer

06b5a6f

added filters.py

8fceb5f

updates for h5py_writer to write dataset values

af4d46a

revert to using members for dtype enums

7c393b6

add support for reference types

825fc89

support for h5py and json readers and writers

88fa1eb

fix for vlen encoding

541b966

fix for reference types

398e2d3

fix flake8 errors

9978c45

fix flake8 error

436d921

update testall script

51063f6

fix flake8 error

d14599a

make tmp dir in testall

e4be33c

fix for h5json writer on windows

8af6508

require python >= 3.9

d519d8b

remove redundant stripId function

4169d5c

add test for incremental updates

7840ca4

jreadey added 14 commits September 9, 2025 16:28

revert h5py_util.py

51f2a9b

use uuid as representation of Reference type

e7452ca

fix len ref in hsds_reader

5b6f33d

fix for reading unpersisted dataset values

8e6d14a

fix for created and lastModified keys

5561767

fix for scalar datasets

924ee00

move hsds plugins to h5pyd

1f90429

moved hsds reader/writer tests to h5pyd

c60e1c9

fix for getDatasetValues

29ae237

fix for datasets with fillvalue

b904ea5

added dset_util functions

65b94c1

added filter functions

eb138bc

added more dset utility functions

fec0a43

added shape_util.py

5ab9b65

mattjala requested changes Oct 3, 2025

View reviewed changes

mattjala reviewed Oct 3, 2025

View reviewed changes

Abstract reader/writer design #80

Are you sure you want to change the base?

Abstract reader/writer design #80

Uh oh!

Conversation

jreadey commented Mar 25, 2025

Uh oh!

mattjala left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattjala Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mattjala Oct 3, 2025 •

edited

Loading