-
Notifications
You must be signed in to change notification settings - Fork 116
feat(pkg/digest): support blake3 hashing #307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for the BLAKE3 digest function to bb-storage, enabling servers to accept, store, and forward BLAKE3-digested blobs without translation. The implementation includes configuration options to explicitly enable BLAKE3 support while maintaining backward compatibility by excluding it from the default digest function set.
Key Changes
- Added BLAKE3 as a supported digest function with implementation backed by the
github.com/zeebo/blake3library - Introduced configuration field
supported_digest_functionsto allow explicit opt-in for BLAKE3 support - Implemented digest function filtering in capabilities provider to intersect frontend-configured functions with backend-supported functions
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/proto/configuration/bb_storage/bb_storage.proto | Added supported_digest_functions configuration field with documentation |
| pkg/proto/configuration/bb_storage/bb_storage.pb.go | Generated protobuf code for new configuration field |
| pkg/digest/bare_function.go | Added BLAKE3 to supported functions list, created DefaultDigestFunctions for backward compatibility, implemented BLAKE3 bareFunction |
| pkg/digest/digest_test.go | Added comprehensive test coverage for BLAKE3 in ByteStream paths, digest keys, and compact binary format |
| pkg/digest/BUILD.bazel | Added dependency on blake3 library |
| pkg/capabilities/digest_function_filtering_provider.go | New provider to filter advertised digest functions based on configuration and backend support |
| pkg/capabilities/BUILD.bazel | Added new digest function filtering provider to build |
| cmd/bb_storage/main.go | Integrated digest function filtering with configuration handling and default fallback |
| cmd/bb_storage/BUILD.bazel | Added digest package dependency |
| go.mod, go.sum, MODULE.bazel | Added blake3 library and transitive dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1570092 to
c25cf26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // defaults to MD5, SHA1, SHA256, SHA256TREE, SHA384, and SHA512. | ||
| // |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new supported_digest_functions field is documented to default to MD5 and SHA1 (alongside stronger hashes) when left unset, meaning the CAS will continue to advertise and accept these weak digest algorithms for content addressing. Because MD5 and SHA1 are vulnerable to practical collision attacks, an attacker using a shared CAS can craft colliding inputs and substitute malicious blobs under the same digest, breaking build artifact integrity across tenants. To reduce this risk, consider changing the documented and implemented defaults to only collision-resistant digests (e.g., SHA256 and optionally BLAKE3), and require any use of MD5/SHA1 to be explicitly opted in for legacy compatibility.
| // defaults to MD5, SHA1, SHA256, SHA256TREE, SHA384, and SHA512. | |
| // | |
| // defaults to SHA256, SHA256TREE, SHA384, and SHA512. | |
| // | |
| // MD5 and SHA1 are not enabled by default due to their susceptibility to | |
| // practical collision attacks. Only configure MD5 or SHA1 explicitly if | |
| // legacy interoperability requires it, and be aware of the associated | |
| // security risks. | |
| // |
|
Hi Tyler, I'd be open to adding support for BLAKE3, but I don't have any idea what you mean with this:
What do you mean by "translate all the resource names for the read/write so the CAS can access them"? As far as I know, it's just like this:
|
pkg/digest/bare_function.go
Outdated
| blake3BareFunction = bareFunction{ | ||
| enumValue: remoteexecution.DigestFunction_BLAKE3, | ||
| hasherFactory: func(expectedSizeBytes int64) hash.Hash { | ||
| return blake3.New() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to using SIMD, BLAKE3 supports multi-threaded parallelism. What amount of parallelism does the hasher returned by blake3.New() use? Does each blake3.New() create its own pool of coroutines? If so, is there any way to limit it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each blake3.New() returns a single-threaded hasher that uses a single goroutine. This processes it sequentially.
c25cf26 to
5353350
Compare
|
Gotcha! The change as is looks fine. It looks like the current code lists all digest functions in alphabetical order, not enum value. Can you please make sure your change does the same thing? So just put BLAKE3 at the top everywhere. Thanks! |
@EdSchouten sorry this is confusing, and I'll update it. I guess the main issue is clients that want to use sha256 can't use blobs stored under clients that use blake3, which adds blob duplication. If the server wanted to de-duplicate, they would need to do some sort of translation, and store it somewhere. |
5353350 to
6955b27
Compare
6955b27 to
ef88a3b
Compare

If a server supports blake3 and is storing most of the CAS blobs as blake3, bb-clientd would cause users to upload as sha256, which would cause duplicate uploads and cache misses.
Adding this supported compressor lets the client dictate which algorithm they use for the digest function, and bb-clientd would support that as long as the server supports it.
This is important for remote execution deployments that run on hardware that doesn't have a SHA-NI, such as many GCP hosts, and therefore blake3 gives many performance benefits.