Skip to content

LibBlosc2: New codec #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,11 @@ jobs:
- ChunkCodecCore/**
- ChunkCodecTests/**
- LibBlosc/**
LibBlosc2:
- .github/**
- ChunkCodecCore/**
- ChunkCodecTests/**
- LibBlosc2/**
LibBrotli:
- .github/**
- ChunkCodecCore/**
Expand Down
11 changes: 11 additions & 0 deletions LibBlosc2/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Release Notes

All notable changes to this package will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## Unreleased

### Added

- Initial release
21 changes: 21 additions & 0 deletions LibBlosc2/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2025 Erik Schnetter

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
18 changes: 18 additions & 0 deletions LibBlosc2/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
name = "ChunkCodecLibBlosc2"
uuid = "59b5581c-e2bc-42b3-a6f1-80e88eec7b70"
authors = ["Erik Schnetter <[email protected]>"]
version = "0.1.0"

[deps]
Accessors = "7d9f7c33-5ae7-4f3b-8dc6-eff91059b697"
Blosc2_jll = "d43303dc-dd0e-56c6-b0a8-331f4c8c9bfb"
ChunkCodecCore = "0b6fb165-00bc-4d37-ab8b-79f91016dbe1"

[compat]
Accessors = "0.1.42"
Blosc2_jll = "201.1700.100"
ChunkCodecCore = "0.5.0"
julia = "1.10"

[workspace]
projects = ["test"]
26 changes: 26 additions & 0 deletions LibBlosc2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# ChunkCodecLibBlosc2

## Warning: ChunkCodecLibBlosc2 is currently a WIP and its API may drastically change at any time.

This package implements the ChunkCodec interface for the following encoders and decoders
using the c-blosc2 library <https://github.com/Blosc/c-blosc2>

1. `Blosc2Codec`, `Blosc2EncodeOptions`, `Blosc2DecodeOptions`

## Example

```julia-repl
julia> using ChunkCodecLibBlosc2

julia> data = [0x00, 0x01, 0x02, 0x03];

julia> compressed_data = encode(Blosc2EncodeOptions(), data);

julia> decompressed_data = decode(Blosc2Codec(), compressed_data; max_size=length(data), size_hint=length(data));

julia> data == decompressed_data
true
```

The low level interface is defined in the `ChunkCodecCore` package.

61 changes: 61 additions & 0 deletions LibBlosc2/src/ChunkCodecLibBlosc2.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
module ChunkCodecLibBlosc2

using Base.Libc: free

using Accessors

using Blosc2_jll: libblosc2

using ChunkCodecCore:
Codec,
EncodeOptions,
DecodeOptions,
check_in_range,
check_contiguous,
DecodingError
import ChunkCodecCore:
decode_options,
try_decode!,
try_encode!,
encode_bound,
try_find_decoded_size,
decoded_size_range

export Blosc2Codec,
Blosc2EncodeOptions,
Blosc2DecodeOptions,
Blosc2DecodingError

if VERSION >= v"1.11.0-DEV.469"
eval(Meta.parse("public is_compressor_valid, compcode, compname"))
end

# reexport ChunkCodecCore
using ChunkCodecCore: ChunkCodecCore, encode, decode
export ChunkCodecCore, encode, decode

include("libblosc2.jl")

"""
struct Blosc2Codec <: Codec
Blosc2Codec()

Blosc2 compression using c-blosc2 library: https://github.com/Blosc2/c-blosc2

Decoding does not accept any extra data appended to the compressed block.
Decoding also does not accept truncated data, or multiple compressed blocks concatenated together.

[`Blosc2EncodeOptions`](@ref) and [`Blosc2DecodeOptions`](@ref)
can be used to set decoding and encoding options.
"""
struct Blosc2Codec <: Codec end
decode_options(::Blosc2Codec) = Blosc2DecodeOptions()

include("encode.jl")
include("decode.jl")

# Initialize the Blosc2 library. This function is idempotent, i.e. it
# can be called called multiple times without harm.
__init__() = @ccall libblosc2.blosc2_init()::Cvoid

end # module ChunkCodecLibBlosc2
89 changes: 89 additions & 0 deletions LibBlosc2/src/decode.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
"""
Blosc2DecodingError()

Error for data that cannot be decoded.
"""
struct Blosc2DecodingError <: DecodingError
end

function Base.showerror(io::IO, err::Blosc2DecodingError)
print(io, "Blosc2DecodingError: blosc2 compressed buffer cannot be decoded")
return nothing
end

"""
struct Blosc2DecodeOptions <: DecodeOptions
Blosc2DecodeOptions(; kwargs...)

Blosc2 decompression using c-blosc2 library: https://github.com/Blosc/c-blosc2

# Keyword Arguments

- `codec::Blosc2Codec=Blosc2Codec()`
"""
struct Blosc2DecodeOptions <: DecodeOptions
codec::Blosc2Codec
end
Blosc2DecodeOptions(; codec::Blosc2Codec=Blosc2Codec(), kwargs...) = Blosc2DecodeOptions(codec)

function try_find_decoded_size(::Blosc2DecodeOptions, src::AbstractVector{UInt8})::Int64
check_contiguous(src)

copy_cframe = false
schunk = @ccall libblosc2.blosc2_schunk_from_buffer(src::Ptr{UInt8}, length(src)::Int64, copy_cframe::UInt8)::Ptr{Blosc2SChunk}
if schunk == Ptr{Blosc2Storage}()
# These are not a valid blosc2-encoded data
throw(Blosc2DecodingError())
end
@ccall libblosc2.blosc2_schunk_avoid_cframe_free(schunk::Ptr{Blosc2SChunk}, true::UInt8)::Cvoid

total_nbytes = unsafe_load(schunk).nbytes

success = @ccall libblosc2.blosc2_schunk_free(schunk::Ptr{Cvoid})::Cint
@assert success == 0

return total_nbytes::Int64
end

#TODO: implement `try_resize_decode!`

function try_decode!(d::Blosc2DecodeOptions, dst::AbstractVector{UInt8}, src::AbstractVector{UInt8};
kwargs...)::Union{Nothing,Int64}
check_contiguous(dst)
check_contiguous(src)

copy_cframe = false
schunk = @ccall libblosc2.blosc2_schunk_from_buffer(src::Ptr{UInt8}, length(src)::Int64, copy_cframe::UInt8)::Ptr{Blosc2SChunk}
if schunk == Ptr{Blosc2Storage}()
# These are not a valid blosc2-encoded data
throw(Blosc2DecodingError())

Check warning on line 59 in LibBlosc2/src/decode.jl

View check run for this annotation

Codecov / codecov/patch

LibBlosc2/src/decode.jl#L59

Added line #L59 was not covered by tests
end
@ccall libblosc2.blosc2_schunk_avoid_cframe_free(schunk::Ptr{Blosc2SChunk}, true::UInt8)::Cvoid

total_nbytes = unsafe_load(schunk).nbytes
if total_nbytes > length(dst)
# There is not enough space to decode the data
success = @ccall libblosc2.blosc2_schunk_free(schunk::Ptr{Cvoid})::Cint
@assert success == 0

return nothing
end

dst_position = Int64(0)

nchunks = unsafe_load(schunk).nchunks
for nchunk in 0:(nchunks - 1)
nbytes_left = clamp(total_nbytes - dst_position, Int32)
nbytes = @ccall libblosc2.blosc2_schunk_decompress_chunk(schunk::Ptr{Blosc2SChunk}, nchunk::Int64,
pointer(dst, dst_position+1)::Ptr{Cvoid}, nbytes_left::Int32)::Cint
@assert nbytes > 0

dst_position += nbytes
end
@assert dst_position == total_nbytes

success = @ccall libblosc2.blosc2_schunk_free(schunk::Ptr{Cvoid})::Cint
@assert success == 0

return total_nbytes::Int64
end
146 changes: 146 additions & 0 deletions LibBlosc2/src/encode.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
"""
struct Blosc2EncodeOptions <: EncodeOptions
Blosc2EncodeOptions(; kwargs...)

Blosc2 compression using c-blosc2 library: https://github.com/Blosc2/c-blosc2

# Keyword Arguments

- `codec::Blosc2Codec=Blosc2Codec()`
- `clevel::Integer=5`: The compression level, between 0 (no compression) and 9 (maximum compression)
- `doshuffle::Integer=1`: Whether to use the shuffle filter.

0 means not applying it, 1 means applying it at a byte level,
and 2 means at a bit level (slower but may achieve better entropy alignment).
- `typesize::Integer=1`: The element size to use when shuffling.

For implementation reasons, only `typesize` in `1:$(BLOSC_MAX_TYPESIZE)` will allow the
shuffle filter to work. When `typesize` is not in this range, shuffle
will be silently disabled.
- `compressor::AbstractString="lz4"`: The string representing the type of compressor to use.

For example, "blosclz", "lz4", "lz4hc", "zlib", or "zstd".
Use `is_compressor_valid` to check if a compressor is supported.
"""
struct Blosc2EncodeOptions <: EncodeOptions
codec::Blosc2Codec
clevel::Int32
doshuffle::Int32
typesize::Int64
chunksize::Int64
compressor::String
end
function Blosc2EncodeOptions(;
codec::Blosc2Codec=Blosc2Codec(),
clevel::Integer=5,
doshuffle::Integer=1,
typesize::Integer=1,
chunksize::Integer=Int64(1024)^3, # 1 GByte
compressor::AbstractString="lz4",
kwargs...)
_clevel = Int32(clamp(clevel, 0, 9))
check_in_range(0:2; doshuffle)
_typesize = if typesize ∈ 2:BLOSC_MAX_TYPESIZE
Int64(typesize)
else
Int64(1)
end
_chunksize = Int64(clamp(chunksize, 1024, Int64(1024)^3)) # 1 GByte
is_compressor_valid(compressor) ||
throw(ArgumentError("is_compressor_valid(compressor) must hold. Got\ncompressor => $(repr(compressor))"))
return Blosc2EncodeOptions(codec, _clevel, doshuffle, _typesize, _chunksize, compressor)
end

# The maximum overhead for the schunk
const MAX_SCHUNK_OVERHEAD = 172 # apparently undocumented -- just a guess
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the overhead is undocumented, one option is to create the output in Julia with a known overhead. This is what I do for brotli:

# Helper function ported from https://github.com/google/brotli/blob/v1.1.0/c/enc/encode.c#L1215
# /* Wraps data to uncompressed brotli stream with minimal window size.
# |output| should point at region with at least encode_bound
# addressable bytes.
# Returns the length of stream. */
function unsafe_MakeUncompressedStream(input::Ptr{UInt8}, input_size::Int64, output::Ptr{UInt8})::Int64


# We just punt with the upper bound. typemax(Int64) is a huge number anyway.
decoded_size_range(e::Blosc2EncodeOptions) = Int64(0):Int64(e.typesize):(typemax(Int64) ÷ 2)

function encode_bound(e::Blosc2EncodeOptions, src_size::Int64)::Int64
return clamp(widen(src_size) + cld(src_size, e.chunksize) * BLOSC2_MAX_OVERHEAD + MAX_SCHUNK_OVERHEAD, Int64)
end

function try_encode!(e::Blosc2EncodeOptions, dst::AbstractVector{UInt8}, src::AbstractVector{UInt8};
kwargs...)::Union{Nothing,Int64}
check_contiguous(dst)
check_contiguous(src)
src_size::Int64 = length(src)
dst_size::Int64 = length(dst)
check_in_range(decoded_size_range(e); src_size)

ccode = compcode(e.compressor)
@assert ccode >= 0
numinternalthreads = 1

# Create a super-chunk container
cparams = Blosc2CParams()
@reset cparams.typesize = e.typesize
@reset cparams.compcode = ccode
@reset cparams.clevel = e.clevel
@reset cparams.nthreads = numinternalthreads
@reset cparams.filters[BLOSC2_MAX_FILTERS] = e.doshuffle
cparams_obj = [cparams]

dparams = Blosc2DParams()
@reset dparams.nthreads = numinternalthreads
dparams_obj = [dparams]

io = Blosc2IO()
io_obj = [io]

storage = Blosc2Storage()
@reset storage.cparams = pointer(cparams_obj)
@reset storage.dparams = pointer(dparams_obj)
@reset storage.io = pointer(io_obj)
storage_obj = [storage]

there_was_an_error = false

GC.@preserve cparams_obj dparams_obj io_obj storage_obj begin
schunk = @ccall libblosc2.blosc2_schunk_new(storage_obj::Ptr{Blosc2Storage})::Ptr{Blosc2SChunk}
@assert schunk != Ptr{Blosc2Storage}()

# Break input into chunks
for pos in 1:e.chunksize:src_size
endpos = min(src_size, pos + e.chunksize - 1)
srcview = @view src[pos:endpos]
nbytes = length(srcview)
nchunks = @ccall libblosc2.blosc2_schunk_append_buffer(schunk::Ptr{Blosc2SChunk}, srcview::Ptr{Cvoid},
nbytes::Int32)::Int64
@assert nchunks >= 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does libblosc2.blosc2_schunk_append_buffer return negative nchunks if it runs out of memory? If so the @assert needs to be replaced with error handling, because @asserts are only for documentation and testing, not error handling.

@assert nchunks == (pos-1) ÷ e.chunksize + 1
end

cframe = Ref{Ptr{UInt8}}()
needs_free = Ref{UInt8}() # bool
compressed_size = @ccall libblosc2.blosc2_schunk_to_buffer(schunk::Ptr{Blosc2SChunk}, cframe::Ref{Ptr{UInt8}},
needs_free::Ref{UInt8})::Int64
@assert compressed_size >= 0
cframe = cframe[]
needs_free = Bool(needs_free[])

if compressed_size <= length(dst)
# We should try to encode directly into `dst`. (This may
# not be possible with the Blosc2 API.)
unsafe_copyto!(pointer(dst), cframe, compressed_size)
else
# Insufficient space to stored compressed data.
# We should detect this earlier, already in the loop above.
there_was_an_error = true
end

success = @ccall libblosc2.blosc2_schunk_free(schunk::Ptr{Blosc2SChunk})::Cint
@assert success == 0

if needs_free
Libc.free(cframe)
end
end

if there_was_an_error
return nothing
end

return compressed_size::Int64
end
Loading
Loading