Skip to content
Open
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
0345a1c
Add dimension separator as a type parameter
mkitti Mar 11, 2025
61786e3
Fix ZipStore constructor
mkitti Mar 12, 2025
cbb23ce
Fix ConsolidatedStore
mkitti Mar 12, 2025
e4630a9
Fix S3Store constructor
mkitti Mar 12, 2025
b9e175f
Add version as a type parameter
mkitti Mar 13, 2025
3624376
Check metadata for dimension_separator and zarr_format
mkitti Mar 18, 2025
2b3bbb2
Implement VersionStorage wrapper rather than modifying AbstractStorage
mkitti Mar 26, 2025
5f35ebf
Fix ConslidatedStore wrapper around HTTP
mkitti Mar 31, 2025
c685387
Add getproperty forwarding from VersionedStorage
mkitti Mar 31, 2025
8d5606d
Add some tests for propertynames
mkitti Mar 31, 2025
a6fcc2b
Add Storage/versionstore.jl
mkitti Mar 31, 2025
f6883f8
Add VersionedStorage param change constructors
mkitti Apr 1, 2025
3cf746d
Add V2 chunk encoding support
mkitti May 6, 2025
d218dc2
Fix Base.UInt8 constructor for ASCIIChar
mkitti May 7, 2025
6f722b5
Add ZstdCompressor
nhz2 Mar 7, 2025
865dac7
fix typo
nhz2 Mar 7, 2025
6d7dc21
Prototype Zarr v3 support
mkitti Apr 1, 2025
b394457
Modify tutorial to match current storage display
mkitti Apr 1, 2025
8e71a33
Ensure configuration key exists
mkitti May 8, 2025
08288fd
Change VersionedStore to FormattedStore
mkitti May 19, 2025
5bb7358
Merge pull request #1 from mkitti/mkitti-formatted-store
mkitti May 23, 2025
020b3dd
Merge branch 'mkitti-dimension-separator-type-parameter' into mkitti-…
mkitti May 30, 2025
0046e14
Add {get,write}attrs for FormattedStore{3}
mkitti Jun 2, 2025
34afb27
Add separator function for V2ChunkKeyEncoding
mkitti Jun 2, 2025
514ba87
Fix formattedstore, add writemetadata
mkitti Jun 2, 2025
4ce5895
Attempt to allow for Zarr v3 array creation
mkitti Jun 2, 2025
3298a5c
Fix Zarr v3 array creation
mkitti Jun 2, 2025
646ba9c
Implement CRC32c Zarr v3 codec
mkitti Jun 2, 2025
d4217fb
Merge branch 'master' into mkitti-v3-prototype
mkitti Jun 4, 2025
07352f3
Fix spelling of Evaluate in comment
mkitti Jun 4, 2025
42b2519
Fix default chunk_key_encoding
mkitti Aug 27, 2025
9722a1a
Merge branch 'master' into mkitti-v3-prototype
mkitti Oct 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ version = "0.9.4"
[deps]
AWSS3 = "1c724243-ef5b-51ab-93f4-b0a88ac62a95"
Blosc = "a74b3585-a348-5f62-a45c-50e91977d574"
CRC32c = "8bf52ea8-c179-5cab-976a-9e18b702a9bc"
ChunkCodecCore = "0b6fb165-00bc-4d37-ab8b-79f91016dbe1"
ChunkCodecLibZstd = "55437552-ac27-4d47-9aa3-63184e8fd398"
CodecZlib = "944b1d66-785c-5afd-91f1-9de20f533193"
Expand All @@ -24,6 +25,7 @@ ZipArchives = "49080126-0e18-4c2a-b176-c102e4b3760c"
[compat]
AWSS3 = "0.10, 0.11"
Blosc = "0.5, 0.6, 0.7"
CRC32c = "1.10, 1.11"
ChunkCodecCore = "0.4.2, 0.5"
ChunkCodecLibZstd = "0.1.2, 0.2"
CodecZlib = "0.6, 0.7"
Expand Down
2 changes: 1 addition & 1 deletion docs/src/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@

When using a compressor, it can be useful to get some diagnostics on the compression ratio. `ZArrays` provide a `zinfo` function which can be used to print some diagnostics, e.g.:

```jldoctest compress

Check failure on line 190 in docs/src/tutorial.md

View workflow job for this annotation

GitHub Actions / Documentation

doctest failure in docs/src/tutorial.md:190-205 ```jldoctest compress julia> zinfo(z) Type : ZArray Data type : Int32 Shape : (10000, 10000) Chunk Shape : (1000, 1000) Order : C Read-Only : false Compressor : Zarr.BloscCompressor(0, 3, "zstd", 1) Filters : nothing Store type : Zarr.VersionedStore{2, '.', Zarr.DictStore}(Dictionary Storage) No. bytes : 400000000 No. bytes stored : 2412289 Storage ratio : 165.81761140559857 Chunks initialized : 100/100 ``` Subexpression: zinfo(z) Evaluated output: Type : ZArray Data type : Int32 Shape : (10000, 10000) Chunk Shape : (1000, 1000) Order : C Read-Only : false Compressor : Zarr.BloscCompressor(0, 3, "zstd", 1) Filters : nothing Store type : Zarr.FormattedStore{2, '.', Zarr.DictStore}(Dictionary Storage) No. bytes : 400000000 No. bytes stored : 2412289 Storage ratio : 165.81761140559857 Chunks initialized : 100/100 Expected output: Type : ZArray Data type : Int32 Shape : (10000, 10000) Chunk Shape : (1000, 1000) Order : C Read-Only : false Compressor : Zarr.BloscCompressor(0, 3, "zstd", 1) Filters : nothing Store type : Zarr.VersionedStore{2, '.', Zarr.DictStore}(Dictionary Storage) No. bytes : 400000000 No. bytes stored : 2412289 Storage ratio : 165.81761140559857 Chunks initialized : 100/100 diff = Warning: Diff output requires color. Type : ZArray Data type : Int32 Shape : (10000, 10000) Chunk Shape : (1000, 1000) Order : C Read-Only : false Compressor : Zarr.BloscCompressor(0, 3, "zstd", 1) Filters : nothing Store type : Zarr.VersionedStore{2, Zarr.FormattedStore{2, '.', Zarr.DictStore}(Dictionary Storage) No. bytes : 400000000 No. bytes stored : 2412289 Storage ratio : 165.81761140559857 Chunks initialized : 100/100
julia> zinfo(z)
Type : ZArray
Data type : Int32
Expand All @@ -197,7 +197,7 @@
Read-Only : false
Compressor : Zarr.BloscCompressor(0, 3, "zstd", 1)
Filters : nothing
Store type : Dictionary Storage
Store type : Zarr.VersionedStore{2, '.', Zarr.DictStore}(Dictionary Storage)
No. bytes : 400000000
No. bytes stored : 2412289
Storage ratio : 165.81761140559857
Expand Down
49 changes: 49 additions & 0 deletions src/Codecs/Codecs.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
module Codecs

using JSON: JSON

"""
abstract type Codec

The abstract supertype for all Zarr codecs

## Interface

All subtypes of `Codec` SHALL implement the following methods:

- `zencode(a, c::Codec)`: compress the array `a` using the codec `c`.
- `zdecode(a, c::Codec, T)`: decode the array `a` using the codec `c`
and return an array of type `T`.
- `JSON.lower(c::Codec)`: return a JSON representation of the codec `c`, which
follows the Zarr specification for that codec.
- `getCodec(::Type{<:Codec}, d::Dict)`: return a codec object from a given
dictionary `d` which contains the codec's parameters according to the Zarr spec.

Subtypes of `Codec` MAY also implement the following methods:

- `zencode!(encoded, data, c::Codec)`: encode the array `data` using the
codec `c` and store the result in the array `encoded`.
- `zdecode!(data, encoded, c::Codec)`: decode the array `encoded`
using the codec `c` and store the result in the array `data`.

Finally, an entry MUST be added to the `VN.codectypes` dictionary for each codec type where N is the
Zarr format version.
This must also follow the Zarr specification's name for that compressor. The name of the compressor
is the key, and the value is the compressor type (e.g. `BloscCodec` or `NoCodec`).

For example, the Blosc codec is named "blosc" in the Zarr spec, so the entry for [`BloscCodec`](@ref)
must be added to `codectypes` as `codectypes["blosc"] = BloscCodec`.
"""

abstract type Codec end

zencode(a, c::Codec) = error("Unimplemented")
zencode!(encoded, data, c::Codec) = error("Unimplemented")
zdecode(a, c::Codec, T::Type) = error("Unimplemented")
zdecode!(data, encoded, c::Codec) = error("Unimplemented")
JSON.lower(c::Codec) = error("Unimplemented")
getCodec(::Type{<:Codec}, d::Dict) = error("Unimplemented")

Check warning on line 45 in src/Codecs/Codecs.jl

View check run for this annotation

Codecov / codecov/patch

src/Codecs/Codecs.jl#L40-L45

Added lines #L40 - L45 were not covered by tests

include("V3/V3.jl")

end
103 changes: 103 additions & 0 deletions src/Codecs/V3/V3.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
module V3Codecs

import ..Codecs: zencode, zdecode, zencode!, zdecode!
using CRC32c: CRC32c

abstract type V3Codec{In,Out} end
const codectypes = Dict{String, V3Codec}()

@enum BloscCompressor begin
lz4
lz4hc
blosclz
zstd
snappy
zlib
end

@enum BloscShuffle begin
noshuffle
shuffle
bitshuffle
end

struct BloscCodec <: V3Codec{:bytes, :bytes}
cname::BloscCompressor
clevel::Int64
shuffle::BloscShuffle
typesize::UInt8
blocksize::UInt
end
name(::BloscCodec) = "blosc"

Check warning on line 31 in src/Codecs/V3/V3.jl

View check run for this annotation

Codecov / codecov/patch

src/Codecs/V3/V3.jl#L31

Added line #L31 was not covered by tests

struct BytesCodec <: V3Codec{:array, :bytes}
end
name(::BytesCodec) = "bytes"

Check warning on line 35 in src/Codecs/V3/V3.jl

View check run for this annotation

Codecov / codecov/patch

src/Codecs/V3/V3.jl#L35

Added line #L35 was not covered by tests

struct CRC32cCodec <: V3Codec{:bytes, :bytes}
end
name(::CRC32cCodec) = "crc32c"

Check warning on line 39 in src/Codecs/V3/V3.jl

View check run for this annotation

Codecov / codecov/patch

src/Codecs/V3/V3.jl#L39

Added line #L39 was not covered by tests

struct GzipCodec <: V3Codec{:bytes, :bytes}
end
name(::GzipCodec) = "gzip"

Check warning on line 43 in src/Codecs/V3/V3.jl

View check run for this annotation

Codecov / codecov/patch

src/Codecs/V3/V3.jl#L43

Added line #L43 was not covered by tests


#=
zencode(a, c::Codec) = error("Unimplemented")
zencode!(encoded, data, c::Codec) = error("Unimplemented")
zdecode(a, c::Codec, T::Type) = error("Unimplemented")
zdecode!(data, encoded, c::Codec) = error("Unimplemented")
=#

function crc32c_stream!(output::IO, input::IO; buffer = Vector{UInt8}(undef, 1024*32))
hash::UInt32 = 0x00000000
while(bytesavailable(input) > 0)
sized_buffer = @view(buffer[1:min(length(buffer), bytesavailable(input))])
read!(input, sized_buffer)
write(output, sized_buffer)
hash = CRC32c.crc32c(sized_buffer, hash)
end
return hash

Check warning on line 61 in src/Codecs/V3/V3.jl

View check run for this annotation

Codecov / codecov/patch

src/Codecs/V3/V3.jl#L53-L61

Added lines #L53 - L61 were not covered by tests
end
function zencode!(encoded::Vector{UInt8}, data::Vector{UInt8}, c::CRC32cCodec)
output = IOBuffer(encoded, read=false, write=true)
input = IOBuffer(data, read=true, write=false)
zencode!(output, input, c)
return take!(output)

Check warning on line 67 in src/Codecs/V3/V3.jl

View check run for this annotation

Codecov / codecov/patch

src/Codecs/V3/V3.jl#L63-L67

Added lines #L63 - L67 were not covered by tests
end
function zencode!(output::IO, input::IO, c::CRC32cCodec)
hash = crc32c_stream!(output, input)
write(output, hash)
return output

Check warning on line 72 in src/Codecs/V3/V3.jl

View check run for this annotation

Codecov / codecov/patch

src/Codecs/V3/V3.jl#L69-L72

Added lines #L69 - L72 were not covered by tests
end
function zdecode!(encoded::Vector{UInt8}, data::Vector{UInt8}, c::CRC32cCodec)
output = IOBuffer(encoded, read=false, write=true)
input = IOBuffer(data, read=true, write=true)
zdecode!(output, input, c)
return take!(output)

Check warning on line 78 in src/Codecs/V3/V3.jl

View check run for this annotation

Codecov / codecov/patch

src/Codecs/V3/V3.jl#L74-L78

Added lines #L74 - L78 were not covered by tests
end
function zdecode!(output::IOBuffer, input::IOBuffer, c::CRC32cCodec)
input_vec = take!(input)
truncated_input = IOBuffer(@view(input_vec[1:end-4]); read=true, write=false)
hash = crc32c_stream!(output, truncated_input)
if input_vec[end-3:end] != reinterpret(UInt8, [hash])
throw(IOError("CRC32c hash does not match"))

Check warning on line 85 in src/Codecs/V3/V3.jl

View check run for this annotation

Codecov / codecov/patch

src/Codecs/V3/V3.jl#L80-L85

Added lines #L80 - L85 were not covered by tests
end
return output

Check warning on line 87 in src/Codecs/V3/V3.jl

View check run for this annotation

Codecov / codecov/patch

src/Codecs/V3/V3.jl#L87

Added line #L87 was not covered by tests
end

struct ShardingCodec{N} <: V3Codec{:array, :bytes}
chunk_shape::NTuple{N,Int}
codecs::Vector{V3Codec}
index_codecs::Vector{V3Codec}
index_location::Symbol
end
name(::ShardingCodec) = "sharding_indexed"

Check warning on line 96 in src/Codecs/V3/V3.jl

View check run for this annotation

Codecov / codecov/patch

src/Codecs/V3/V3.jl#L96

Added line #L96 was not covered by tests

struct TransposeCodec <: V3Codec{:array, :array}
end
name(::TransposeCodec) = "transpose"

Check warning on line 100 in src/Codecs/V3/V3.jl

View check run for this annotation

Codecov / codecov/patch

src/Codecs/V3/V3.jl#L100

Added line #L100 was not covered by tests


end
7 changes: 5 additions & 2 deletions src/Compressors/Compressors.jl
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,13 @@ const compressortypes = Dict{Union{String,Nothing}, Type{<: Compressor}}()
include("blosc.jl")
include("zlib.jl")
include("zstd.jl")
include("v3.jl")

# ## Fallback definitions for the compressor interface
# Define fallbacks and generic methods for the compressor interface
getCompressor(compdict::Dict) = getCompressor(compressortypes[compdict["id"]],compdict)
getCompressor(compdict::Dict) = haskey(compdict, "id") ?
getCompressor(compressortypes[compdict["id"]], compdict) :
getCompressor(compressortypes[compdict["name"]], compdict["configuration"])
getCompressor(::Nothing) = NoCompressor()

# Compression when no filter is given
Expand Down Expand Up @@ -104,4 +107,4 @@ end

JSON.lower(::NoCompressor) = nothing

compressortypes[nothing] = NoCompressor
compressortypes[nothing] = NoCompressor
58 changes: 58 additions & 0 deletions src/Compressors/v3.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
"""
Compressor v3{C <: Compressor} <: Compressor

Wrapper to indicate Zarr v3 of a compressor
"""
struct Compressor_v3{C} <: Compressor
parent::C

Check warning on line 7 in src/Compressors/v3.jl

View check run for this annotation

Codecov / codecov/patch

src/Compressors/v3.jl#L7

Added line #L7 was not covered by tests
end
Base.parent(c::Compressor_v3) = c.parent

Check warning on line 9 in src/Compressors/v3.jl

View check run for this annotation

Codecov / codecov/patch

src/Compressors/v3.jl#L9

Added line #L9 was not covered by tests

function zuncompress(a, z::Compressor_v3, T)
zuncompress(a, parent(z), T)

Check warning on line 12 in src/Compressors/v3.jl

View check run for this annotation

Codecov / codecov/patch

src/Compressors/v3.jl#L11-L12

Added lines #L11 - L12 were not covered by tests
end

function zuncompress!(data::DenseArray, compressed, z::Compressor_v3)
zuncompress!(data, compressed, parent(z))

Check warning on line 16 in src/Compressors/v3.jl

View check run for this annotation

Codecov / codecov/patch

src/Compressors/v3.jl#L15-L16

Added lines #L15 - L16 were not covered by tests
end

function zcompress(a, z::Compressor_v3)
zcompress(a, parent(z))

Check warning on line 20 in src/Compressors/v3.jl

View check run for this annotation

Codecov / codecov/patch

src/Compressors/v3.jl#L19-L20

Added lines #L19 - L20 were not covered by tests
end


function JSON.lower(c::Compressor_v3{BloscCompressor})
p = parent(c)
return Dict(

Check warning on line 26 in src/Compressors/v3.jl

View check run for this annotation

Codecov / codecov/patch

src/Compressors/v3.jl#L24-L26

Added lines #L24 - L26 were not covered by tests
"name" => "blosc",
"configuration" => Dict(
"cname" => p.cname,
"clevel" => p.clevel,
"shuffle" => p.shuffle,
# TODO: Evaluate if we can encode typesize
# "typesize" => p.typesize,
"blocksize" => p.blocksize
)
)
end

function JSON.lower(c::Compressor_v3{ZlibCompressor})
p = parent(c)
return Dict(

Check warning on line 41 in src/Compressors/v3.jl

View check run for this annotation

Codecov / codecov/patch

src/Compressors/v3.jl#L39-L41

Added lines #L39 - L41 were not covered by tests
"name" => "gzip",
"configuration" => Dict(
"level" => p.clevel
)
)
end

function JSON.lower(c::Compressor_v3{ZstdCompressor})
p = parent(c)
return Dict(

Check warning on line 51 in src/Compressors/v3.jl

View check run for this annotation

Codecov / codecov/patch

src/Compressors/v3.jl#L49-L51

Added lines #L49 - L51 were not covered by tests
"name" => "zstd",
"configuration" => Dict(
"level" => p.config.compressionlevel,
"checksum" => p.config.checksum
)
)
end
3 changes: 2 additions & 1 deletion src/Compressors/zstd.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
This file implements a Zstd compressor via ChunkCodecLibZstd.jl.

=#

using ChunkCodecLibZstd: ZstdEncodeOptions
using ChunkCodecCore: encode, decode, decode!

Expand Down Expand Up @@ -42,4 +43,4 @@ end

JSON.lower(z::ZstdCompressor) = Dict("id"=>"zstd", "level" => z.config.compressionLevel, "checksum" => z.config.checksum)

Zarr.compressortypes["zstd"] = ZstdCompressor
Zarr.compressortypes["zstd"] = ZstdCompressor
9 changes: 7 additions & 2 deletions src/Storage/Storage.jl
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,12 @@ function writeattrs(s::AbstractStore, p, att::Dict; indent_json::Bool= false)
att
end

is_zgroup(s::AbstractStore, p) = isinitialized(s,_concatpath(p,".zgroup"))
is_zarray(s::AbstractStore, p) = isinitialized(s,_concatpath(p,".zarray"))
is_zarr3(s::AbstractStore, p) = isinitialized(s,_concatpath(p,"zarr.json"))
is_zarr2(s::AbstractStore, p) = is_z2array(s, p) || is_z2group(s,p)
is_zgroup(s::AbstractStore, p) = is_z2group(s,p)
is_zarray(s::AbstractStore, p) = is_z2array(s,p)
is_z2group(s::AbstractStore, p) = isinitialized(s,_concatpath(p,".zgroup"))
is_z2array(s::AbstractStore, p) = isinitialized(s,_concatpath(p,".zarray"))

isinitialized(s::AbstractStore, p, i::CartesianIndex)=isinitialized(s,p,citostring(i))
isinitialized(s::AbstractStore, p, i) = isinitialized(s,_concatpath(p,i))
Expand Down Expand Up @@ -197,6 +201,7 @@ isemptysub(s::AbstractStore, p) = isempty(subkeys(s,p)) && isempty(subdirs(s,p))
#during auto-check of storage format when doing zopen
storageregexlist = Pair[]

include("formattedstore.jl")
include("directorystore.jl")
include("dictstore.jl")
include("s3store.jl")
Expand Down
Loading
Loading