-
Notifications
You must be signed in to change notification settings - Fork 2
LibBlosc2: New codec #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 5 commits
7a80e71
50b0173
11d64b1
22b219d
173f629
bc40657
d757e5c
9a80e1c
f947489
1b68df1
08e8e6d
fcc30a9
10744cb
a5e4872
6866d40
1a4b310
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Release Notes | ||
|
||
All notable changes to this package will be documented in this file. | ||
|
||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). | ||
|
||
## Unreleased | ||
|
||
### Added | ||
|
||
- Initial release |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2025 Erik Schnetter | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
name = "ChunkCodecLibBlosc2" | ||
uuid = "59b5581c-e2bc-42b3-a6f1-80e88eec7b70" | ||
authors = ["Erik Schnetter <[email protected]>"] | ||
version = "0.1.0" | ||
|
||
[deps] | ||
Accessors = "7d9f7c33-5ae7-4f3b-8dc6-eff91059b697" | ||
Blosc2_jll = "d43303dc-dd0e-56c6-b0a8-331f4c8c9bfb" | ||
ChunkCodecCore = "0b6fb165-00bc-4d37-ab8b-79f91016dbe1" | ||
|
||
[compat] | ||
Accessors = "0.1.42" | ||
Blosc2_jll = "201.1700.100" | ||
ChunkCodecCore = "0.5.0" | ||
julia = "1.10" | ||
|
||
[workspace] | ||
projects = ["test"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# ChunkCodecLibBlosc2 | ||
|
||
## Warning: ChunkCodecLibBlosc2 is currently a WIP and its API may drastically change at any time. | ||
|
||
This package implements the ChunkCodec interface for the following encoders and decoders | ||
using the c-blosc2 library <https://github.com/Blosc/c-blosc2> | ||
|
||
1. `Blosc2Codec`, `Blosc2EncodeOptions`, `Blosc2DecodeOptions` | ||
|
||
nhz2 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
## Example | ||
|
||
```julia-repl | ||
julia> using ChunkCodecLibBlosc2 | ||
|
||
julia> data = [0x00, 0x01, 0x02, 0x03]; | ||
eschnett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
julia> compressed_data = encode(Blosc2EncodeOptions(), data); | ||
|
||
julia> decompressed_data = decode(Blosc2Codec(), compressed_data; max_size=length(data), size_hint=length(data)); | ||
|
||
julia> data == decompressed_data | ||
true | ||
``` | ||
|
||
The low level interface is defined in the `ChunkCodecCore` package. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
module ChunkCodecLibBlosc2 | ||
|
||
using Base.Libc: free | ||
|
||
using Accessors | ||
|
||
using Blosc2_jll: libblosc2 | ||
|
||
using ChunkCodecCore: | ||
Codec, | ||
EncodeOptions, | ||
DecodeOptions, | ||
check_in_range, | ||
check_contiguous, | ||
DecodingError | ||
import ChunkCodecCore: | ||
decode_options, | ||
try_decode!, | ||
try_encode!, | ||
encode_bound, | ||
try_find_decoded_size, | ||
decoded_size_range | ||
|
||
export Blosc2Codec, | ||
Blosc2EncodeOptions, | ||
Blosc2DecodeOptions, | ||
Blosc2DecodingError | ||
|
||
if VERSION >= v"1.11.0-DEV.469" | ||
eval(Meta.parse("public is_compressor_valid, compcode, compname")) | ||
end | ||
|
||
# reexport ChunkCodecCore | ||
using ChunkCodecCore: ChunkCodecCore, encode, decode | ||
export ChunkCodecCore, encode, decode | ||
|
||
include("libblosc2.jl") | ||
|
||
""" | ||
struct Blosc2Codec <: Codec | ||
Blosc2Codec() | ||
|
||
Blosc2 compression using c-blosc2 library: https://github.com/Blosc2/c-blosc2 | ||
|
||
Decoding does not accept any extra data appended to the compressed block. | ||
Decoding also does not accept truncated data, or multiple compressed blocks concatenated together. | ||
|
||
[`Blosc2EncodeOptions`](@ref) and [`Blosc2DecodeOptions`](@ref) | ||
can be used to set decoding and encoding options. | ||
""" | ||
struct Blosc2Codec <: Codec end | ||
decode_options(::Blosc2Codec) = Blosc2DecodeOptions() | ||
|
||
include("encode.jl") | ||
include("decode.jl") | ||
|
||
# Initialize the Blosc2 library. This function is idempotent, i.e. it | ||
# can be called called multiple times without harm. | ||
__init__() = @ccall libblosc2.blosc2_init()::Cvoid | ||
eschnett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
end # module ChunkCodecLibBlosc2 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
""" | ||
Blosc2DecodingError() | ||
|
||
Error for data that cannot be decoded. | ||
""" | ||
struct Blosc2DecodingError <: DecodingError | ||
end | ||
|
||
function Base.showerror(io::IO, err::Blosc2DecodingError) | ||
print(io, "Blosc2DecodingError: blosc2 compressed buffer cannot be decoded") | ||
return nothing | ||
end | ||
|
||
""" | ||
struct Blosc2DecodeOptions <: DecodeOptions | ||
Blosc2DecodeOptions(; kwargs...) | ||
|
||
Blosc2 decompression using c-blosc2 library: https://github.com/Blosc/c-blosc2 | ||
|
||
# Keyword Arguments | ||
|
||
- `codec::Blosc2Codec=Blosc2Codec()` | ||
""" | ||
struct Blosc2DecodeOptions <: DecodeOptions | ||
codec::Blosc2Codec | ||
end | ||
Blosc2DecodeOptions(; codec::Blosc2Codec=Blosc2Codec(), kwargs...) = Blosc2DecodeOptions(codec) | ||
|
||
function try_find_decoded_size(::Blosc2DecodeOptions, src::AbstractVector{UInt8})::Int64 | ||
check_contiguous(src) | ||
|
||
copy_cframe = false | ||
schunk = @ccall libblosc2.blosc2_schunk_from_buffer(src::Ptr{UInt8}, length(src)::Int64, copy_cframe::UInt8)::Ptr{Blosc2SChunk} | ||
if schunk == Ptr{Blosc2Storage}() | ||
# These are not a valid blosc2-encoded data | ||
throw(Blosc2DecodingError()) | ||
end | ||
@ccall libblosc2.blosc2_schunk_avoid_cframe_free(schunk::Ptr{Blosc2SChunk}, true::UInt8)::Cvoid | ||
|
||
total_nbytes = unsafe_load(schunk).nbytes | ||
|
||
success = @ccall libblosc2.blosc2_schunk_free(schunk::Ptr{Cvoid})::Cint | ||
@assert success == 0 | ||
|
||
return total_nbytes::Int64 | ||
end | ||
|
||
#TODO: implement `try_resize_decode!` | ||
|
||
function try_decode!(d::Blosc2DecodeOptions, dst::AbstractVector{UInt8}, src::AbstractVector{UInt8}; | ||
kwargs...)::Union{Nothing,Int64} | ||
check_contiguous(dst) | ||
check_contiguous(src) | ||
|
||
copy_cframe = false | ||
schunk = @ccall libblosc2.blosc2_schunk_from_buffer(src::Ptr{UInt8}, length(src)::Int64, copy_cframe::UInt8)::Ptr{Blosc2SChunk} | ||
if schunk == Ptr{Blosc2Storage}() | ||
# These are not a valid blosc2-encoded data | ||
throw(Blosc2DecodingError()) | ||
end | ||
@ccall libblosc2.blosc2_schunk_avoid_cframe_free(schunk::Ptr{Blosc2SChunk}, true::UInt8)::Cvoid | ||
|
||
total_nbytes = unsafe_load(schunk).nbytes | ||
if total_nbytes > length(dst) | ||
# There is not enough space to decode the data | ||
success = @ccall libblosc2.blosc2_schunk_free(schunk::Ptr{Cvoid})::Cint | ||
@assert success == 0 | ||
|
||
return nothing | ||
end | ||
|
||
dst_position = Int64(0) | ||
|
||
nchunks = unsafe_load(schunk).nchunks | ||
for nchunk in 0:(nchunks - 1) | ||
nbytes_left = clamp(total_nbytes - dst_position, Int32) | ||
nbytes = @ccall libblosc2.blosc2_schunk_decompress_chunk(schunk::Ptr{Blosc2SChunk}, nchunk::Int64, | ||
pointer(dst, dst_position+1)::Ptr{Cvoid}, nbytes_left::Int32)::Cint | ||
@assert nbytes > 0 | ||
eschnett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
dst_position += nbytes | ||
end | ||
@assert dst_position == total_nbytes | ||
eschnett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
success = @ccall libblosc2.blosc2_schunk_free(schunk::Ptr{Cvoid})::Cint | ||
@assert success == 0 | ||
|
||
return total_nbytes::Int64 | ||
end |
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,146 @@ | ||||||||||||||
""" | ||||||||||||||
struct Blosc2EncodeOptions <: EncodeOptions | ||||||||||||||
Blosc2EncodeOptions(; kwargs...) | ||||||||||||||
|
||||||||||||||
Blosc2 compression using c-blosc2 library: https://github.com/Blosc2/c-blosc2 | ||||||||||||||
|
||||||||||||||
# Keyword Arguments | ||||||||||||||
|
||||||||||||||
- `codec::Blosc2Codec=Blosc2Codec()` | ||||||||||||||
- `clevel::Integer=5`: The compression level, between 0 (no compression) and 9 (maximum compression) | ||||||||||||||
- `doshuffle::Integer=1`: Whether to use the shuffle filter. | ||||||||||||||
|
||||||||||||||
0 means not applying it, 1 means applying it at a byte level, | ||||||||||||||
and 2 means at a bit level (slower but may achieve better entropy alignment). | ||||||||||||||
- `typesize::Integer=1`: The element size to use when shuffling. | ||||||||||||||
|
||||||||||||||
For implementation reasons, only `typesize` in `1:$(BLOSC_MAX_TYPESIZE)` will allow the | ||||||||||||||
shuffle filter to work. When `typesize` is not in this range, shuffle | ||||||||||||||
will be silently disabled. | ||||||||||||||
- `compressor::AbstractString="lz4"`: The string representing the type of compressor to use. | ||||||||||||||
|
||||||||||||||
For example, "blosclz", "lz4", "lz4hc", "zlib", or "zstd". | ||||||||||||||
Use `is_compressor_valid` to check if a compressor is supported. | ||||||||||||||
""" | ||||||||||||||
struct Blosc2EncodeOptions <: EncodeOptions | ||||||||||||||
codec::Blosc2Codec | ||||||||||||||
clevel::Int32 | ||||||||||||||
doshuffle::Int32 | ||||||||||||||
typesize::Int64 | ||||||||||||||
chunksize::Int64 | ||||||||||||||
compressor::String | ||||||||||||||
end | ||||||||||||||
function Blosc2EncodeOptions(; | ||||||||||||||
codec::Blosc2Codec=Blosc2Codec(), | ||||||||||||||
clevel::Integer=5, | ||||||||||||||
doshuffle::Integer=1, | ||||||||||||||
typesize::Integer=1, | ||||||||||||||
chunksize::Integer=Int64(1024)^3, # 1 GByte | ||||||||||||||
compressor::AbstractString="lz4", | ||||||||||||||
kwargs...) | ||||||||||||||
_clevel = Int32(clamp(clevel, 0, 9)) | ||||||||||||||
check_in_range(0:2; doshuffle) | ||||||||||||||
_typesize = if typesize ∈ 2:BLOSC_MAX_TYPESIZE | ||||||||||||||
Int64(typesize) | ||||||||||||||
else | ||||||||||||||
Int64(1) | ||||||||||||||
end | ||||||||||||||
_chunksize = Int64(clamp(chunksize, 1024, Int64(1024)^3)) # 1 GByte | ||||||||||||||
is_compressor_valid(compressor) || | ||||||||||||||
throw(ArgumentError("is_compressor_valid(compressor) must hold. Got\ncompressor => $(repr(compressor))")) | ||||||||||||||
return Blosc2EncodeOptions(codec, _clevel, doshuffle, _typesize, _chunksize, compressor) | ||||||||||||||
end | ||||||||||||||
|
||||||||||||||
# The maximum overhead for the schunk | ||||||||||||||
const MAX_SCHUNK_OVERHEAD = 172 # apparently undocumented -- just a guess | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the overhead is undocumented, one option is to create the output in Julia with a known overhead. This is what I do for brotli: ChunkCodecs.jl/LibBrotli/src/libbrotli.jl Lines 48 to 53 in db0c871
|
||||||||||||||
|
||||||||||||||
# We just punt with the upper bound. typemax(Int64) is a huge number anyway. | ||||||||||||||
decoded_size_range(e::Blosc2EncodeOptions) = Int64(0):Int64(e.typesize):(typemax(Int64) ÷ 2) | ||||||||||||||
|
||||||||||||||
function encode_bound(e::Blosc2EncodeOptions, src_size::Int64)::Int64 | ||||||||||||||
return clamp(widen(src_size) + cld(src_size, e.chunksize) * BLOSC2_MAX_OVERHEAD + MAX_SCHUNK_OVERHEAD, Int64) | ||||||||||||||
end | ||||||||||||||
|
||||||||||||||
function try_encode!(e::Blosc2EncodeOptions, dst::AbstractVector{UInt8}, src::AbstractVector{UInt8}; | ||||||||||||||
kwargs...)::Union{Nothing,Int64} | ||||||||||||||
check_contiguous(dst) | ||||||||||||||
check_contiguous(src) | ||||||||||||||
src_size::Int64 = length(src) | ||||||||||||||
dst_size::Int64 = length(dst) | ||||||||||||||
check_in_range(decoded_size_range(e); src_size) | ||||||||||||||
|
||||||||||||||
ccode = compcode(e.compressor) | ||||||||||||||
@assert ccode >= 0 | ||||||||||||||
numinternalthreads = 1 | ||||||||||||||
|
||||||||||||||
# Create a super-chunk container | ||||||||||||||
cparams = Blosc2CParams() | ||||||||||||||
@reset cparams.typesize = e.typesize | ||||||||||||||
@reset cparams.compcode = ccode | ||||||||||||||
@reset cparams.clevel = e.clevel | ||||||||||||||
@reset cparams.nthreads = numinternalthreads | ||||||||||||||
@reset cparams.filters[BLOSC2_MAX_FILTERS] = e.doshuffle | ||||||||||||||
cparams_obj = [cparams] | ||||||||||||||
|
||||||||||||||
dparams = Blosc2DParams() | ||||||||||||||
@reset dparams.nthreads = numinternalthreads | ||||||||||||||
dparams_obj = [dparams] | ||||||||||||||
|
||||||||||||||
io = Blosc2IO() | ||||||||||||||
io_obj = [io] | ||||||||||||||
|
||||||||||||||
storage = Blosc2Storage() | ||||||||||||||
@reset storage.cparams = pointer(cparams_obj) | ||||||||||||||
@reset storage.dparams = pointer(dparams_obj) | ||||||||||||||
@reset storage.io = pointer(io_obj) | ||||||||||||||
storage_obj = [storage] | ||||||||||||||
|
||||||||||||||
there_was_an_error = false | ||||||||||||||
|
||||||||||||||
GC.@preserve cparams_obj dparams_obj io_obj storage_obj begin | ||||||||||||||
schunk = @ccall libblosc2.blosc2_schunk_new(storage_obj::Ptr{Blosc2Storage})::Ptr{Blosc2SChunk} | ||||||||||||||
@assert schunk != Ptr{Blosc2Storage}() | ||||||||||||||
|
||||||||||||||
# Break input into chunks | ||||||||||||||
for pos in 1:e.chunksize:src_size | ||||||||||||||
endpos = min(src_size, pos + e.chunksize - 1) | ||||||||||||||
srcview = @view src[pos:endpos] | ||||||||||||||
nbytes = length(srcview) | ||||||||||||||
nchunks = @ccall libblosc2.blosc2_schunk_append_buffer(schunk::Ptr{Blosc2SChunk}, srcview::Ptr{Cvoid}, | ||||||||||||||
nbytes::Int32)::Int64 | ||||||||||||||
@assert nchunks >= 0 | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does |
||||||||||||||
@assert nchunks == (pos-1) ÷ e.chunksize + 1 | ||||||||||||||
end | ||||||||||||||
|
||||||||||||||
cframe = Ref{Ptr{UInt8}}() | ||||||||||||||
needs_free = Ref{UInt8}() # bool | ||||||||||||||
compressed_size = @ccall libblosc2.blosc2_schunk_to_buffer(schunk::Ptr{Blosc2SChunk}, cframe::Ref{Ptr{UInt8}}, | ||||||||||||||
needs_free::Ref{UInt8})::Int64 | ||||||||||||||
@assert compressed_size >= 0 | ||||||||||||||
cframe = cframe[] | ||||||||||||||
needs_free = Bool(needs_free[]) | ||||||||||||||
|
||||||||||||||
if compressed_size <= length(dst) | ||||||||||||||
# We should try to encode directly into `dst`. (This may | ||||||||||||||
# not be possible with the Blosc2 API.) | ||||||||||||||
unsafe_copyto!(pointer(dst), cframe, compressed_size) | ||||||||||||||
else | ||||||||||||||
# Insufficient space to stored compressed data. | ||||||||||||||
# We should detect this earlier, already in the loop above. | ||||||||||||||
there_was_an_error = true | ||||||||||||||
end | ||||||||||||||
|
||||||||||||||
success = @ccall libblosc2.blosc2_schunk_free(schunk::Ptr{Blosc2SChunk})::Cint | ||||||||||||||
@assert success == 0 | ||||||||||||||
|
||||||||||||||
if needs_free | ||||||||||||||
Libc.free(cframe) | ||||||||||||||
end | ||||||||||||||
end | ||||||||||||||
|
||||||||||||||
if there_was_an_error | ||||||||||||||
return nothing | ||||||||||||||
end | ||||||||||||||
|
||||||||||||||
return compressed_size::Int64 | ||||||||||||||
end |
Uh oh!
There was an error while loading. Please reload this page.