Refactor for compatibility with upcoming Codecs Release#2
Refactor for compatibility with upcoming Codecs Release#2tpietzsch merged 12 commits intotpietzsch:readdatafrom
Conversation
…ectDataBlockCodec DataCodecs now creat the access object and return it during `deserialize`
…lt/String/Object blocks
N5BlockCodec uses dataType (and potentially other DatasetAttributes to wrap the desired DataBlockCodec
|
I think this doesn't really help to explore the options much, because it is just a static method. It would be more interesting to have an actual factory instance and see how factories compose and how to get it into DatasetAttributes. Anyway, currently |
refactor: inline createDataBlockCodec from constructor params
…ay be needed when multiple codecs are supported
|
Largely kept all your feedback. Highlights (mostly reiterating the commit messages):
On the last point, let me know if you have opinoins about naming. It feels weird to tie the name to Zarr in N5 core, but if that's what it is, It is probably the best name. We could move it to Zarr, but if we want to re-use the [DataCodec] hierarchy, we'd need to expose the constructors, at least to be |
Regarding this. I'm not sure from a performance standpoint, but I found in practice it was not functionally useful in the way I wanted it to be. Ideally, what I had hoped is that I could for example tell the ReadData the target result size, say of the output of a decompressed gzip stream. The problem is that in practice, this was not possible. gzip decompressing only report the length of the decompressed data after it is done reading. What this meant was that if I was relying on the It may be that this is not good behavior on the gzip decompressor's part, but it is the behavior I experience. That was the only place I found it used, so I removed it for now, to avoid the false sense of security. But if it is useful more broadly without leading us astray, I'm happy to add it back in |
I removed the ByteOrder argument. Zarr uses numpy for the file format. That fixes the byte order for the int32 that encode string lengths to LITTLE_ENDIAN. The encoded strings themselves are UTF-8 byte streams, so independent of endianness, |
* refactor: don't pass `decodedLength` to ReadData * refactor: move DataBlockFactory/DataBlockCodecFactory to N5Codecs * feat: Add StringDataCodec, ObjectDataCodec, StringDataBlockCodec, ObjectDataBlockCodec DataCodecs now creat the access object and return it during `deserialize` * refactor: add AbstractDataBlock to extract shared logic between Default/String/Object blocks * refactor: DatasetAttributes responsible for DataBlockCodec creation N5BlockCodec uses dataType (and potentially other DatasetAttributes to wrap the desired DataBlockCodec * revert: add back createDataBlock logic in DataType * doc: retain javadoc from before refactor * revert: keep protected constructor with DataBlockCodec parameter refactor: inline createDataBlockCodec from constructor params * refactor: remove currently unused N5BlockCodec. Something like this may be needed when multiple codecs are supported * refactor: dont expose N5Codecs internals * refactor: rename encodeBlockHeader -> createBlockHeader * feat: add ZarrStringDataCodec support
Some modification I found useful when using the ReadData structure to handle multiple codecs. Brief overview:
decodedLength