-
-
Notifications
You must be signed in to change notification settings - Fork 366
Description
Although the V3 Group does not inherit from Mapping or MutableMapping, it has methods that are mapping-like, e.g. __contains__, __getitem__, __iter__, __delitem__ etc. I think this makes sense in terms of continuity with zarr v2, where Group explicitly implemented MutableMapping, but I'm not sure we ultimately want to do this.
Here I will argue against making Group to be dict-like.
First, an argument from fairness: conceptually, a Group can be expressed as the intersection of "a thing that contains sub-groups and sub-arrays" and "a thing that contains JSON-like attributes". Both "contains sub-groups and sub-arrays" and "contains JSON" can be implemented with the MutableMapping API. So a Group is two mutable mappings, not one. But if Group is itself a MutableMapping, then said methods have to pick one of the two possible sub-mappings to target. I don't feel confident in stating that the members of a group are more or less "important" than the attributes of the group. So I don't feel confident in picking one of these two mappings as the target for MutableMapping methods on Group. Thus I think using MutableMapping for the members of a Group, while delegating the same API for attributes under a .attrs is unfair (it's an arbitrary, and unnecessary, decision), and probably a mistake in V2, and also a mistake in h5py (where I think this API originated). There's a simple solution here: treat Group.members and Group.attributes each as their own MutableMapping, and don't pretend that Group is a dict, because it's not -- if anything, it's two dicts.
Second, an argument from ergonomics. We want to support something like x = Group["subarray"], or x = Group.members["subarray"]. This expands to x = (access the metadata for "subarray", then return an in-memory Array instance from it). Creating that in-memory Array instance will almost certainly take runtime parameters, e.g. write_empty_chunks, caching control, etc. But __getitem__ is hard to parametrize. If we rely on Group["subarray"], we are forced to choose from a few unattractive options:
- we can define those
Arrayparameters in a global configuration object, or as properties of theGroupinstance - we can ask users to modify the
Arrayinstance after assigning it. - we can remove those runtime parameters from the
Arrayconstructor and push them down to the array methods (which honestly I kind of like)
But if we leave behind the dict API, then we just have regular functions that can be parametrized as needed, which gives us a lot of flexibility. So I would be curious to know what we get from the dict API that we would stand to lose here, besides continuity with v2, which I don't value too much at this point.
So, my conclusion is that we should not implement Group.__getitem__, or any other Mapping-like method on Group. There might be a good argument for implementing the MutableMapping API for Group.attrs and Group.members, but maybe not. Curious to hear whether anyone has strong objections to this direction.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status