-
Notifications
You must be signed in to change notification settings - Fork 2
LibBlosc2: New codec #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
eschnett
wants to merge
16
commits into
JuliaIO:main
Choose a base branch
from
eschnett:eschnett/LibBlosc2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
7a80e71
LibBlosc2: New chunk codec
eschnett 50b0173
CI: Enable LibBlosc2
eschnett 11d64b1
LibBlosc2: Allow dynamic chunk sizes
eschnett 22b219d
README: List Blosc2
eschnett 173f629
Improve code
eschnett bc40657
No global initialization
eschnett d757e5c
Blosc2 is not thread-safe
eschnett 9a80e1c
CI: Add debug output, build only Windows
eschnett f947489
CI: Add debug output, build only Windows
eschnett 1b68df1
CI: Avoid segfault
eschnett 08e8e6d
CI: Avoid segfault
eschnett fcc30a9
Merge branch 'main' into eschnett/LibBlosc2
nhz2 10744cb
Update CI.yml
nhz2 a5e4872
LibBlosc2: Improve error handling
eschnett 6866d40
Merge branch 'eschnett/LibBlosc2' of https://github.com/eschnett/Chun…
eschnett 1a4b310
Remove Blosc2 from list of registered codecs
eschnett File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Release Notes | ||
|
||
All notable changes to this package will be documented in this file. | ||
|
||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). | ||
|
||
## Unreleased | ||
|
||
### Added | ||
|
||
- Initial release |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2025 Erik Schnetter | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
name = "ChunkCodecLibBlosc2" | ||
uuid = "59b5581c-e2bc-42b3-a6f1-80e88eec7b70" | ||
authors = ["Erik Schnetter <[email protected]>"] | ||
version = "0.1.0" | ||
|
||
[deps] | ||
Accessors = "7d9f7c33-5ae7-4f3b-8dc6-eff91059b697" | ||
Blosc2_jll = "d43303dc-dd0e-56c6-b0a8-331f4c8c9bfb" | ||
ChunkCodecCore = "0b6fb165-00bc-4d37-ab8b-79f91016dbe1" | ||
|
||
[compat] | ||
Accessors = "0.1.42" | ||
Blosc2_jll = "201.1700.100" | ||
ChunkCodecCore = "0.5.0" | ||
julia = "1.10" | ||
|
||
[workspace] | ||
projects = ["test"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# ChunkCodecLibBlosc2 | ||
|
||
## Warning: ChunkCodecLibBlosc2 is currently a WIP and its API may drastically change at any time. | ||
|
||
This package implements the ChunkCodec interface for the following encoders and decoders | ||
using the c-blosc2 library <https://github.com/Blosc/c-blosc2> | ||
|
||
1. `Blosc2CFrame`, `Blosc2EncodeOptions`, `Blosc2DecodeOptions` | ||
|
||
Note: It appears that the [Blosc2 Contiguous Frame | ||
Format](https://www.blosc.org/c-blosc2/format/cframe_format.html) is | ||
not fully protected by checksums. The [`c-blosc2` | ||
library](https://www.blosc.org/c-blosc2) may crash (segfault) for | ||
invalid inputs. | ||
|
||
## Example | ||
|
||
```julia-repl | ||
julia> using ChunkCodecLibBlosc2 | ||
|
||
julia> data = collect(0x00:0x07); | ||
|
||
julia> compressed_data = encode(Blosc2EncodeOptions(), data); | ||
|
||
julia> decompressed_data = decode(Blosc2CFrame(), compressed_data; max_size=length(data), size_hint=length(data)); | ||
|
||
julia> data == decompressed_data | ||
true | ||
``` | ||
|
||
The low level interface is defined in the `ChunkCodecCore` package. | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
module ChunkCodecLibBlosc2 | ||
|
||
using Base.Threads | ||
|
||
using Accessors: @reset | ||
|
||
using Blosc2_jll: libblosc2 | ||
|
||
using ChunkCodecCore: | ||
Codec, | ||
EncodeOptions, | ||
DecodeOptions, | ||
check_in_range, | ||
check_contiguous, | ||
DecodingError | ||
import ChunkCodecCore: | ||
decode_options, | ||
try_decode!, | ||
try_encode!, | ||
encode_bound, | ||
try_find_decoded_size, | ||
decoded_size_range | ||
|
||
export Blosc2CFrame, | ||
Blosc2EncodeOptions, | ||
Blosc2DecodeOptions, | ||
Blosc2DecodingError | ||
|
||
if VERSION >= v"1.11.0-DEV.469" | ||
eval(Meta.parse("public is_compressor_valid, compcode, compname")) | ||
end | ||
|
||
# reexport ChunkCodecCore | ||
using ChunkCodecCore: ChunkCodecCore, encode, decode | ||
export ChunkCodecCore, encode, decode | ||
|
||
include("libblosc2.jl") | ||
|
||
""" | ||
struct Blosc2CFrame <: Codec | ||
Blosc2CFrame() | ||
|
||
Blosc2 compression using c-blosc2 library: https://github.com/Blosc2/c-blosc2 | ||
|
||
Decoding does not accept any extra data appended to the compressed block. | ||
Decoding also does not accept truncated data, or multiple compressed blocks concatenated together. | ||
|
||
[`Blosc2EncodeOptions`](@ref) and [`Blosc2DecodeOptions`](@ref) | ||
can be used to set decoding and encoding options. | ||
""" | ||
struct Blosc2CFrame <: Codec end | ||
decode_options(::Blosc2CFrame) = Blosc2DecodeOptions() | ||
|
||
include("encode.jl") | ||
include("decode.jl") | ||
|
||
end # module ChunkCodecLibBlosc2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,123 @@ | ||
""" | ||
Blosc2DecodingError() | ||
|
||
Error for data that cannot be decoded. | ||
""" | ||
struct Blosc2DecodingError <: DecodingError | ||
code::Cint | ||
end | ||
|
||
function Base.showerror(io::IO, err::Blosc2DecodingError) | ||
print(io, "Blosc2DecodingError: blosc2 compressed buffer cannot be decoded, error code: $(err.code)") | ||
return nothing | ||
end | ||
|
||
""" | ||
struct Blosc2DecodeOptions <: DecodeOptions | ||
Blosc2DecodeOptions(; kwargs...) | ||
|
||
Blosc2 decompression using c-blosc2 library: https://github.com/Blosc/c-blosc2 | ||
|
||
# Keyword Arguments | ||
|
||
- `codec::Blosc2CFrame = Blosc2CFrame()` | ||
- `nthreads::Integer = 1`: The number of threads to use | ||
""" | ||
struct Blosc2DecodeOptions <: DecodeOptions | ||
codec::Blosc2CFrame | ||
|
||
nthreads::Int | ||
end | ||
function Blosc2DecodeOptions(; codec::Blosc2CFrame=Blosc2CFrame(), | ||
nthreads::Integer=1, | ||
kwargs...) | ||
_nthreads = nthreads | ||
check_in_range(1:typemax(Int32); nthreads=_nthreads) | ||
|
||
return Blosc2DecodeOptions(codec, _nthreads) | ||
end | ||
|
||
function try_find_decoded_size(::Blosc2DecodeOptions, src::AbstractVector{UInt8})::Int64 | ||
check_contiguous(src) | ||
|
||
blosc2_init() | ||
|
||
copy_cframe = false | ||
schunk = @ccall libblosc2.blosc2_schunk_from_buffer(src::Ptr{UInt8}, length(src)::Int64, copy_cframe::UInt8)::Ptr{Blosc2SChunk} | ||
if schunk == Ptr{Blosc2Storage}() | ||
# These are not a valid blosc2-encoded data | ||
throw(Blosc2DecodingError(0)) | ||
end | ||
@ccall libblosc2.blosc2_schunk_avoid_cframe_free(schunk::Ptr{Blosc2SChunk}, true::UInt8)::Cvoid | ||
|
||
total_nbytes = unsafe_load(schunk).nbytes | ||
|
||
success = @ccall libblosc2.blosc2_schunk_free(schunk::Ptr{Cvoid})::Cint | ||
if success != 0 | ||
# Something went wrong | ||
throw(Blosc2DecodingError(0)) | ||
end | ||
|
||
return total_nbytes::Int64 | ||
end | ||
|
||
# Note: We should implement `try_resize_decode!` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since |
||
|
||
function try_decode!(d::Blosc2DecodeOptions, dst::AbstractVector{UInt8}, src::AbstractVector{UInt8}; | ||
kwargs...)::Union{Nothing,Int64} | ||
check_contiguous(dst) | ||
check_contiguous(src) | ||
|
||
blosc2_init() | ||
|
||
# I don't think there is a way to specify a decompression context. | ||
# That means that our `Blosc2DecodeOptions` will be unused. | ||
# We could try writing to the `dctx` field in the `schunk`. | ||
|
||
copy_cframe = false | ||
schunk = @ccall libblosc2.blosc2_schunk_from_buffer(src::Ptr{UInt8}, length(src)::Int64, copy_cframe::UInt8)::Ptr{Blosc2SChunk} | ||
if schunk == Ptr{Blosc2Storage}() | ||
# These are not a valid blosc2-encoded data | ||
throw(Blosc2DecodingError(0)) | ||
end | ||
@ccall libblosc2.blosc2_schunk_avoid_cframe_free(schunk::Ptr{Blosc2SChunk}, true::UInt8)::Cvoid | ||
|
||
total_nbytes = unsafe_load(schunk).nbytes | ||
if total_nbytes > length(dst) | ||
# There is not enough space to decode the data | ||
success = @ccall libblosc2.blosc2_schunk_free(schunk::Ptr{Cvoid})::Cint | ||
if success != 0 | ||
# Something went wrong | ||
throw(Blosc2DecodingError(0)) | ||
end | ||
|
||
return nothing | ||
end | ||
|
||
dst_position = Int64(0) | ||
|
||
nchunks = unsafe_load(schunk).nchunks | ||
for nchunk in 0:(nchunks - 1) | ||
nbytes_left = clamp(total_nbytes - dst_position, Int32) | ||
nbytes = @ccall libblosc2.blosc2_schunk_decompress_chunk(schunk::Ptr{Blosc2SChunk}, nchunk::Int64, | ||
pointer(dst, dst_position+1)::Ptr{Cvoid}, nbytes_left::Int32)::Cint | ||
if nbytes <= 0 | ||
# There was an error decompressing the data | ||
throw(Blosc2DecodingError(nbytes)) | ||
end | ||
|
||
dst_position += nbytes | ||
end | ||
if dst_position != total_nbytes | ||
# The decompressed size is inconsistent | ||
throw(Blosc2DecodingError(0)) | ||
end | ||
|
||
success = @ccall libblosc2.blosc2_schunk_free(schunk::Ptr{Cvoid})::Cint | ||
if success != 0 | ||
# Something went wrong | ||
throw(Blosc2DecodingError(0)) | ||
end | ||
|
||
return total_nbytes::Int64 | ||
end |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.