Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
0345a1c
Add dimension separator as a type parameter
mkitti Mar 11, 2025
61786e3
Fix ZipStore constructor
mkitti Mar 12, 2025
cbb23ce
Fix ConsolidatedStore
mkitti Mar 12, 2025
e4630a9
Fix S3Store constructor
mkitti Mar 12, 2025
b9e175f
Add version as a type parameter
mkitti Mar 13, 2025
3624376
Check metadata for dimension_separator and zarr_format
mkitti Mar 18, 2025
2b3bbb2
Implement VersionStorage wrapper rather than modifying AbstractStorage
mkitti Mar 26, 2025
5f35ebf
Fix ConslidatedStore wrapper around HTTP
mkitti Mar 31, 2025
c685387
Add getproperty forwarding from VersionedStorage
mkitti Mar 31, 2025
8d5606d
Add some tests for propertynames
mkitti Mar 31, 2025
a6fcc2b
Add Storage/versionstore.jl
mkitti Mar 31, 2025
f6883f8
Add VersionedStorage param change constructors
mkitti Apr 1, 2025
3cf746d
Add V2 chunk encoding support
mkitti May 6, 2025
d218dc2
Fix Base.UInt8 constructor for ASCIIChar
mkitti May 7, 2025
6f722b5
Add ZstdCompressor
nhz2 Mar 7, 2025
865dac7
fix typo
nhz2 Mar 7, 2025
6d7dc21
Prototype Zarr v3 support
mkitti Apr 1, 2025
b394457
Modify tutorial to match current storage display
mkitti Apr 1, 2025
8e71a33
Ensure configuration key exists
mkitti May 8, 2025
08288fd
Change VersionedStore to FormattedStore
mkitti May 19, 2025
5bb7358
Merge pull request #1 from mkitti/mkitti-formatted-store
mkitti May 23, 2025
020b3dd
Merge branch 'mkitti-dimension-separator-type-parameter' into mkitti-…
mkitti May 30, 2025
0046e14
Add {get,write}attrs for FormattedStore{3}
mkitti Jun 2, 2025
34afb27
Add separator function for V2ChunkKeyEncoding
mkitti Jun 2, 2025
514ba87
Fix formattedstore, add writemetadata
mkitti Jun 2, 2025
4ce5895
Attempt to allow for Zarr v3 array creation
mkitti Jun 2, 2025
3298a5c
Fix Zarr v3 array creation
mkitti Jun 2, 2025
646ba9c
Implement CRC32c Zarr v3 codec
mkitti Jun 2, 2025
d4217fb
Merge branch 'master' into mkitti-v3-prototype
mkitti Jun 4, 2025
07352f3
Fix spelling of Evaluate in comment
mkitti Jun 4, 2025
42b2519
Fix default chunk_key_encoding
mkitti Aug 27, 2025
9722a1a
Merge branch 'master' into mkitti-v3-prototype
mkitti Oct 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ version = "0.9.5"
[deps]
AWSS3 = "1c724243-ef5b-51ab-93f4-b0a88ac62a95"
Blosc = "a74b3585-a348-5f62-a45c-50e91977d574"
CRC32c = "8bf52ea8-c179-5cab-976a-9e18b702a9bc"
ChunkCodecCore = "0b6fb165-00bc-4d37-ab8b-79f91016dbe1"
ChunkCodecLibZlib = "4c0bbee4-addc-4d73-81a0-b6caacae83c8"
ChunkCodecLibZstd = "55437552-ac27-4d47-9aa3-63184e8fd398"
Expand All @@ -27,6 +28,7 @@ Blosc = "0.5, 0.6, 0.7"
ChunkCodecCore = "1"
ChunkCodecLibZlib = "1"
ChunkCodecLibZstd = "1"
CRC32c = "1.10, 1.11"
DataStructures = "0.17, 0.18, 0.19"
DateTimes64 = "1"
DiskArrays = "0.4.2"
Expand Down
2 changes: 1 addition & 1 deletion docs/src/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@

When using a compressor, it can be useful to get some diagnostics on the compression ratio. `ZArrays` provide a `zinfo` function which can be used to print some diagnostics, e.g.:

```jldoctest compress

Check failure on line 190 in docs/src/tutorial.md

View workflow job for this annotation

GitHub Actions / Documentation

doctest failure in docs/src/tutorial.md:190-205 ```jldoctest compress julia> zinfo(z) Type : ZArray Data type : Int32 Shape : (10000, 10000) Chunk Shape : (1000, 1000) Order : C Read-Only : false Compressor : Zarr.BloscCompressor(0, 3, "zstd", 1) Filters : nothing Store type : Zarr.VersionedStore{2, '.', Zarr.DictStore}(Dictionary Storage) No. bytes : 400000000 No. bytes stored : 2412289 Storage ratio : 165.81761140559857 Chunks initialized : 100/100 ``` Subexpression: zinfo(z) Evaluated output: Type : ZArray Data type : Int32 Shape : (10000, 10000) Chunk Shape : (1000, 1000) Order : C Read-Only : false Compressor : Zarr.BloscCompressor(0, 3, "zstd", 1) Filters : nothing Store type : Zarr.FormattedStore{2, '.', Zarr.DictStore}(Dictionary Storage) No. bytes : 400000000 No. bytes stored : 2412289 Storage ratio : 165.81761140559857 Chunks initialized : 100/100 Expected output: Type : ZArray Data type : Int32 Shape : (10000, 10000) Chunk Shape : (1000, 1000) Order : C Read-Only : false Compressor : Zarr.BloscCompressor(0, 3, "zstd", 1) Filters : nothing Store type : Zarr.VersionedStore{2, '.', Zarr.DictStore}(Dictionary Storage) No. bytes : 400000000 No. bytes stored : 2412289 Storage ratio : 165.81761140559857 Chunks initialized : 100/100 diff = Warning: Diff output requires color. Type : ZArray Data type : Int32 Shape : (10000, 10000) Chunk Shape : (1000, 1000) Order : C Read-Only : false Compressor : Zarr.BloscCompressor(0, 3, "zstd", 1) Filters : nothing Store type : Zarr.VersionedStore{2, Zarr.FormattedStore{2, '.', Zarr.DictStore}(Dictionary Storage) No. bytes : 400000000 No. bytes stored : 2412289 Storage ratio : 165.81761140559857 Chunks initialized : 100/100
julia> zinfo(z)
Type : ZArray
Data type : Int32
Expand All @@ -197,7 +197,7 @@
Read-Only : false
Compressor : Zarr.BloscCompressor(0, 3, "zstd", 1)
Filters : nothing
Store type : Dictionary Storage
Store type : Zarr.VersionedStore{2, '.', Zarr.DictStore}(Dictionary Storage)
No. bytes : 400000000
No. bytes stored : 2412289
Storage ratio : 165.81761140559857
Expand Down
49 changes: 49 additions & 0 deletions src/Codecs/Codecs.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
module Codecs

using JSON: JSON

"""
abstract type Codec

The abstract supertype for all Zarr codecs

## Interface

All subtypes of `Codec` SHALL implement the following methods:

- `zencode(a, c::Codec)`: compress the array `a` using the codec `c`.
- `zdecode(a, c::Codec, T)`: decode the array `a` using the codec `c`
and return an array of type `T`.
- `JSON.lower(c::Codec)`: return a JSON representation of the codec `c`, which
follows the Zarr specification for that codec.
- `getCodec(::Type{<:Codec}, d::Dict)`: return a codec object from a given
dictionary `d` which contains the codec's parameters according to the Zarr spec.

Subtypes of `Codec` MAY also implement the following methods:

- `zencode!(encoded, data, c::Codec)`: encode the array `data` using the
codec `c` and store the result in the array `encoded`.
- `zdecode!(data, encoded, c::Codec)`: decode the array `encoded`
using the codec `c` and store the result in the array `data`.

Finally, an entry MUST be added to the `VN.codectypes` dictionary for each codec type where N is the
Zarr format version.
This must also follow the Zarr specification's name for that compressor. The name of the compressor
is the key, and the value is the compressor type (e.g. `BloscCodec` or `NoCodec`).

For example, the Blosc codec is named "blosc" in the Zarr spec, so the entry for [`BloscCodec`](@ref)
must be added to `codectypes` as `codectypes["blosc"] = BloscCodec`.
"""

abstract type Codec end

zencode(a, c::Codec) = error("Unimplemented")
zencode!(encoded, data, c::Codec) = error("Unimplemented")
zdecode(a, c::Codec, T::Type) = error("Unimplemented")
zdecode!(data, encoded, c::Codec) = error("Unimplemented")
JSON.lower(c::Codec) = error("Unimplemented")
getCodec(::Type{<:Codec}, d::Dict) = error("Unimplemented")

include("V3/V3.jl")

end
103 changes: 103 additions & 0 deletions src/Codecs/V3/V3.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
module V3Codecs

import ..Codecs: zencode, zdecode, zencode!, zdecode!
using CRC32c: CRC32c

abstract type V3Codec{In,Out} end
const codectypes = Dict{String, V3Codec}()

@enum BloscCompressor begin
lz4
lz4hc
blosclz
zstd
snappy
zlib
end

@enum BloscShuffle begin
noshuffle
shuffle
bitshuffle
end

struct BloscCodec <: V3Codec{:bytes, :bytes}
cname::BloscCompressor
clevel::Int64
shuffle::BloscShuffle
typesize::UInt8
blocksize::UInt
end
name(::BloscCodec) = "blosc"

struct BytesCodec <: V3Codec{:array, :bytes}
end
name(::BytesCodec) = "bytes"

struct CRC32cCodec <: V3Codec{:bytes, :bytes}
end
name(::CRC32cCodec) = "crc32c"

struct GzipCodec <: V3Codec{:bytes, :bytes}
end
name(::GzipCodec) = "gzip"


#=
zencode(a, c::Codec) = error("Unimplemented")
zencode!(encoded, data, c::Codec) = error("Unimplemented")
zdecode(a, c::Codec, T::Type) = error("Unimplemented")
zdecode!(data, encoded, c::Codec) = error("Unimplemented")
=#

function crc32c_stream!(output::IO, input::IO; buffer = Vector{UInt8}(undef, 1024*32))
hash::UInt32 = 0x00000000
while(bytesavailable(input) > 0)
sized_buffer = @view(buffer[1:min(length(buffer), bytesavailable(input))])
read!(input, sized_buffer)
write(output, sized_buffer)
hash = CRC32c.crc32c(sized_buffer, hash)
end
return hash
end
function zencode!(encoded::Vector{UInt8}, data::Vector{UInt8}, c::CRC32cCodec)
output = IOBuffer(encoded, read=false, write=true)
input = IOBuffer(data, read=true, write=false)
zencode!(output, input, c)
return take!(output)
end
function zencode!(output::IO, input::IO, c::CRC32cCodec)
hash = crc32c_stream!(output, input)
write(output, hash)
return output
end
function zdecode!(encoded::Vector{UInt8}, data::Vector{UInt8}, c::CRC32cCodec)
output = IOBuffer(encoded, read=false, write=true)
input = IOBuffer(data, read=true, write=true)
zdecode!(output, input, c)
return take!(output)
end
function zdecode!(output::IOBuffer, input::IOBuffer, c::CRC32cCodec)
input_vec = take!(input)
truncated_input = IOBuffer(@view(input_vec[1:end-4]); read=true, write=false)
hash = crc32c_stream!(output, truncated_input)
if input_vec[end-3:end] != reinterpret(UInt8, [hash])
throw(IOError("CRC32c hash does not match"))
end
return output
end

struct ShardingCodec{N} <: V3Codec{:array, :bytes}
chunk_shape::NTuple{N,Int}
codecs::Vector{V3Codec}
index_codecs::Vector{V3Codec}
index_location::Symbol
end
name(::ShardingCodec) = "sharding_indexed"

struct TransposeCodec <: V3Codec{:array, :array}
end
name(::TransposeCodec) = "transpose"


end
7 changes: 5 additions & 2 deletions src/Compressors/Compressors.jl
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,13 @@ const compressortypes = Dict{Union{String,Nothing}, Type{<: Compressor}}()
include("blosc.jl")
include("zlib.jl")
include("zstd.jl")
include("v3.jl")

# ## Fallback definitions for the compressor interface
# Define fallbacks and generic methods for the compressor interface
getCompressor(compdict::Dict) = getCompressor(compressortypes[compdict["id"]],compdict)
getCompressor(compdict::Dict) = haskey(compdict, "id") ?
getCompressor(compressortypes[compdict["id"]], compdict) :
getCompressor(compressortypes[compdict["name"]], compdict["configuration"])
getCompressor(::Nothing) = NoCompressor()

# Compression when no filter is given
Expand Down Expand Up @@ -104,4 +107,4 @@ end

JSON.lower(::NoCompressor) = nothing

compressortypes[nothing] = NoCompressor
compressortypes[nothing] = NoCompressor
58 changes: 58 additions & 0 deletions src/Compressors/v3.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
"""
Compressor v3{C <: Compressor} <: Compressor

Wrapper to indicate Zarr v3 of a compressor
"""
struct Compressor_v3{C} <: Compressor
parent::C
end
Base.parent(c::Compressor_v3) = c.parent

function zuncompress(a, z::Compressor_v3, T)
zuncompress(a, parent(z), T)
end

function zuncompress!(data::DenseArray, compressed, z::Compressor_v3)
zuncompress!(data, compressed, parent(z))
end

function zcompress(a, z::Compressor_v3)
zcompress(a, parent(z))
end


function JSON.lower(c::Compressor_v3{BloscCompressor})
p = parent(c)
return Dict(
"name" => "blosc",
"configuration" => Dict(
"cname" => p.cname,
"clevel" => p.clevel,
"shuffle" => p.shuffle,
# TODO: Evaluate if we can encode typesize
# "typesize" => p.typesize,
"blocksize" => p.blocksize
)
)
end

function JSON.lower(c::Compressor_v3{ZlibCompressor})
p = parent(c)
return Dict(
"name" => "gzip",
"configuration" => Dict(
"level" => p.clevel
)
)
end

function JSON.lower(c::Compressor_v3{ZstdCompressor})
p = parent(c)
return Dict(
"name" => "zstd",
"configuration" => Dict(
"level" => p.config.compressionlevel,
"checksum" => p.config.checksum
)
)
end
3 changes: 2 additions & 1 deletion src/Compressors/zstd.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
This file implements a Zstd compressor via ChunkCodecLibZstd.jl.

=#

using ChunkCodecLibZstd: ZstdEncodeOptions
using ChunkCodecCore: encode, decode, decode!

Expand Down Expand Up @@ -51,4 +52,4 @@ function JSON.lower(z::ZstdCompressor)
end
end

Zarr.compressortypes["zstd"] = ZstdCompressor
Zarr.compressortypes["zstd"] = ZstdCompressor
9 changes: 7 additions & 2 deletions src/Storage/Storage.jl
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,12 @@ function writeattrs(s::AbstractStore, p, att::Dict; indent_json::Bool= false)
att
end

is_zgroup(s::AbstractStore, p) = isinitialized(s,_concatpath(p,".zgroup"))
is_zarray(s::AbstractStore, p) = isinitialized(s,_concatpath(p,".zarray"))
is_zarr3(s::AbstractStore, p) = isinitialized(s,_concatpath(p,"zarr.json"))
is_zarr2(s::AbstractStore, p) = is_z2array(s, p) || is_z2group(s,p)
is_zgroup(s::AbstractStore, p) = is_z2group(s,p)
is_zarray(s::AbstractStore, p) = is_z2array(s,p)
is_z2group(s::AbstractStore, p) = isinitialized(s,_concatpath(p,".zgroup"))
is_z2array(s::AbstractStore, p) = isinitialized(s,_concatpath(p,".zarray"))

isinitialized(s::AbstractStore, p, i::CartesianIndex)=isinitialized(s,p,citostring(i))
isinitialized(s::AbstractStore, p, i) = isinitialized(s,_concatpath(p,i))
Expand Down Expand Up @@ -197,6 +201,7 @@ isemptysub(s::AbstractStore, p) = isempty(subkeys(s,p)) && isempty(subdirs(s,p))
#during auto-check of storage format when doing zopen
storageregexlist = Pair[]

include("formattedstore.jl")
include("directorystore.jl")
include("dictstore.jl")
include("s3store.jl")
Expand Down
Loading
Loading