Skip to content

Double counting groups in V3 #1228

@rabernat

Description

@rabernat

Zarr version

2.13.3

Numcodecs version

n/a

Python Version

3.10

Operating System

Mac

Installation

poetry

Description

I have discovered that certain zarr methods double count groups in V3. The methods Group.group_keys() and Group.groups() return the group twice when the group contains children.

Instead, each group should only be returned once.

Steps to reproduce

The following test should pass

import zarr

def test_double_counting_group_v3():
    store = zarr.MemoryStoreV3()
    root_group = zarr.group(store=store, zarr_version=3)
    sub_group = root_group.create_group("foo")
    array = sub_group.create("bar", shape=10, dtype="i4")
    # this works and is in spec (although the spec says it should be "foo/")
    assert store.listdir("meta/root") == ['foo', 'foo.group.json']
    # these should work but don't: instead we get foo twice
    assert list(root_group.group_keys()) == ['foo']
    assert list(root_group.groups()) == [("foo", sub_group)]

What is happening is that the the listdir call here is returning both foo and foo.group.json. Because contains_group('foo') is valid, the group is returned twice.

for key in sorted(listdir(self._store, dir_name)):
if key.endswith(group_sfx):
key = key[:-len(group_sfx)]
path = self._key_prefix + key
if path.endswith(".array" + self._metadata_key_suffix):
# skip array keys
continue
if contains_group(self._store, path, explicit_only=False):
yield key

If others agree this is a bug, I will submit a PR to fix it.

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions