-
-
Notifications
You must be signed in to change notification settings - Fork 364
chore/handle numcodecs codecs #3376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore/handle numcodecs codecs #3376
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3376 +/- ##
==========================================
- Coverage 60.60% 60.53% -0.08%
==========================================
Files 79 81 +2
Lines 9506 9694 +188
==========================================
+ Hits 5761 5868 +107
- Misses 3745 3826 +81
🚀 New features to boost your workflow:
|
|
||
def _encode(self, chunk_data: Buffer, prototype: BufferPrototype) -> Buffer: | ||
encoded = self._codec.encode(chunk_data.as_array_like()) | ||
if isinstance(encoded, np.ndarray): # Required for checksum codecs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we know statically which are checksum codecs without the isinstance check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n.b., this was copy + pasted from numcodecs, but I think the answer is "no"
codec_name: str | ||
codec_config: dict[str, JSON] | ||
|
||
def __init_subclass__(cls, *, codec_name: str | None = None, **kwargs: Any) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would a codec definition look like without this magic? I'd be fine with repeating a few things if it meant we could avoid this (and IIUC some of the complexity in __repr__
and __init__
would go away too?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that I have rebased #3332 off of this PR, I am 100% going to demagic these codecs in that effort.
> Can you explain the different cases here? Spinning this question out into the main thread -- from me, the general answer to questions like this will be "no", since I am only copy+pasting stuff from numcodecs. I haven't spent too much time figuring out what this code is doing. I do think @normanrz and @TomNicholas might be able to answer some of these questions though. |
Ah, I didn't realize this was mostly from numcodecs. I think that moots most of my comments aside from where in the public API we put these. |
yeah I should have made more clear that this is nearly all directly copy + pasted from |
…v-b/zarr-python into chore/handle-numcodecs-codecs
…ore/handle-numcodecs-codecs
I think this is ready to go in (and it's necessary for #3332) |
and important recent addition: I moved all of the invocations of |
@d-v-b have you tested this with different numcodecs versions to make sure there's no unexpected issues with clobbering of codec registration? |
I'm don't think I expect any behavior to depend on numcodecs versions, happy to be corrected though. My understanding is that the codecs in we have a test that checks our compatibility with these codecs, defined as dicts. In main these tests will pick up the codec class from numcodecs, but in this PR the version of the codec defined in zarr python is used instead. Is that kind of thing you are worried about? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic, great work Davis!
@dataclass(frozen=True) | ||
class _NumcodecsCodec(Metadata): | ||
codec_name: str | ||
codec_config: dict[str, JSON] | ||
|
||
def __init_subclass__(cls, *, codec_name: str | None = None, **kwargs: Any) -> None: | ||
"""To be used only when creating the actual public-facing codec class.""" | ||
super().__init_subclass__(**kwargs) | ||
if codec_name is not None: | ||
namespace = codec_name | ||
|
||
cls_name = f"{CODEC_PREFIX}{namespace}.{cls.__name__}" | ||
cls.codec_name = f"{CODEC_PREFIX}{namespace}" | ||
cls.__doc__ = f""" | ||
See :class:`{cls_name}` for more details and parameters. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vaguely remember a discussion a few months back about classes initialized this way having challenges with serialization...but I can't track down the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those issues would have been resolved by zarr-developers/numcodecs#745, and this PR uses the changes from that PR
def test_generic_compressor(codec_class: type[_numcodecs._NumcodecsBytesBytesCodec]) -> None: | ||
data = np.arange(0, 256, dtype="uint16").reshape((16, 16)) | ||
|
||
with pytest.warns(ZarrUserWarning, match=EXPECTED_WARNING_STR): | ||
a = create_array( | ||
{}, | ||
shape=data.shape, | ||
chunks=(16, 16), | ||
dtype=data.dtype, | ||
fill_value=0, | ||
compressors=[codec_class()], | ||
) | ||
|
||
a[:, :] = data.copy() | ||
np.testing.assert_array_equal(data, a[:, :]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this test.
This PR brings in all the codecs defined in
numcodecs.zarr3
. After this PR is merged, we can safely replace thenumcodecs.zarr3
module with reexports from zarr python, or removenumcodecs.zarr3
entirely, thereby fixing our circular dependency problem.This PR also changes the default config to ensure that the locally-defined codecs take priority over the same codec found in the numcodecs registry.