Skip to content

Conversation

@tyler-french
Copy link
Contributor

@tyler-french tyler-french commented Dec 30, 2025

If a server supports blake3 and is storing most of the CAS blobs as blake3, bb-clientd would cause users to upload as sha256, which would cause duplicate uploads and cache misses.

Adding this supported compressor lets the client dictate which algorithm they use for the digest function, and bb-clientd would support that as long as the server supports it.

This is important for remote execution deployments that run on hardware that doesn't have a SHA-NI, such as many GCP hosts, and therefore blake3 gives many performance benefits.

Copilot AI review requested due to automatic review settings December 30, 2025 17:13
@tyler-french tyler-french changed the title support blake3 for servers that support blake3 support blake3 for if the backend server supports blake3 Dec 30, 2025
@aspect-workflows
Copy link

aspect-workflows bot commented Dec 30, 2025

Test

1 test target passed

Targets
//pkg/digest:digest_test [k8-fastbuild]                 261ms

Total test execution time was 261ms. 29 tests (96.7%) were fully cached saving 7s.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for the BLAKE3 digest function to bb-storage, enabling servers to accept, store, and forward BLAKE3-digested blobs without translation. The implementation includes configuration options to explicitly enable BLAKE3 support while maintaining backward compatibility by excluding it from the default digest function set.

Key Changes

  • Added BLAKE3 as a supported digest function with implementation backed by the github.com/zeebo/blake3 library
  • Introduced configuration field supported_digest_functions to allow explicit opt-in for BLAKE3 support
  • Implemented digest function filtering in capabilities provider to intersect frontend-configured functions with backend-supported functions

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/proto/configuration/bb_storage/bb_storage.proto Added supported_digest_functions configuration field with documentation
pkg/proto/configuration/bb_storage/bb_storage.pb.go Generated protobuf code for new configuration field
pkg/digest/bare_function.go Added BLAKE3 to supported functions list, created DefaultDigestFunctions for backward compatibility, implemented BLAKE3 bareFunction
pkg/digest/digest_test.go Added comprehensive test coverage for BLAKE3 in ByteStream paths, digest keys, and compact binary format
pkg/digest/BUILD.bazel Added dependency on blake3 library
pkg/capabilities/digest_function_filtering_provider.go New provider to filter advertised digest functions based on configuration and backend support
pkg/capabilities/BUILD.bazel Added new digest function filtering provider to build
cmd/bb_storage/main.go Integrated digest function filtering with configuration handling and default fallback
cmd/bb_storage/BUILD.bazel Added digest package dependency
go.mod, go.sum, MODULE.bazel Added blake3 library and transitive dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 108 to 109
// defaults to MD5, SHA1, SHA256, SHA256TREE, SHA384, and SHA512.
//
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new supported_digest_functions field is documented to default to MD5 and SHA1 (alongside stronger hashes) when left unset, meaning the CAS will continue to advertise and accept these weak digest algorithms for content addressing. Because MD5 and SHA1 are vulnerable to practical collision attacks, an attacker using a shared CAS can craft colliding inputs and substitute malicious blobs under the same digest, breaking build artifact integrity across tenants. To reduce this risk, consider changing the documented and implemented defaults to only collision-resistant digests (e.g., SHA256 and optionally BLAKE3), and require any use of MD5/SHA1 to be explicitly opted in for legacy compatibility.

Suggested change
// defaults to MD5, SHA1, SHA256, SHA256TREE, SHA384, and SHA512.
//
// defaults to SHA256, SHA256TREE, SHA384, and SHA512.
//
// MD5 and SHA1 are not enabled by default due to their susceptibility to
// practical collision attacks. Only configure MD5 or SHA1 explicitly if
// legacy interoperability requires it, and be aware of the associated
// security risks.
//

Copilot uses AI. Check for mistakes.
@EdSchouten
Copy link
Member

Hi Tyler,

I'd be open to adding support for BLAKE3, but I don't have any idea what you mean with this:

Consider the scenario when a client uses blake3, and the remote cache uses blake3, but bb-clientd doesn't support it. Now bb-clientd needs to translate all the resource names for the read/write so the CAS can access them. This is both inefficient and unnecessary.

What do you mean by "translate all the resource names for the read/write so the CAS can access them"? As far as I know, it's just like this:

Consider the scenario when a client uses blake3, and the remote cache uses blake3, but bb-clientd doesn't support it. In that case calls against GetCapabilities() made through bb-clientd will suppress the availability of blake3, causing clients to fall back to conventional algorithms like SHA-256.

blake3BareFunction = bareFunction{
enumValue: remoteexecution.DigestFunction_BLAKE3,
hasherFactory: func(expectedSizeBytes int64) hash.Hash {
return blake3.New()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to using SIMD, BLAKE3 supports multi-threaded parallelism. What amount of parallelism does the hasher returned by blake3.New() use? Does each blake3.New() create its own pool of coroutines? If so, is there any way to limit it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each blake3.New() returns a single-threaded hasher that uses a single goroutine. This processes it sequentially.

@EdSchouten
Copy link
Member

Gotcha! The change as is looks fine. It looks like the current code lists all digest functions in alphabetical order, not enum value. Can you please make sure your change does the same thing? So just put BLAKE3 at the top everywhere. Thanks!

@tyler-french tyler-french changed the title support blake3 for if the backend server supports blake3 Support blake3 hashing Dec 30, 2025
@tyler-french tyler-french changed the title Support blake3 hashing feat(pkg/digest): support blake3 hashing Dec 30, 2025
@tyler-french
Copy link
Contributor Author

Hi Tyler,

I'd be open to adding support for BLAKE3, but I don't have any idea what you mean with this:

Consider the scenario when a client uses blake3, and the remote cache uses blake3, but bb-clientd doesn't support it. Now bb-clientd needs to translate all the resource names for the read/write so the CAS can access them. This is both inefficient and unnecessary.

What do you mean by "translate all the resource names for the read/write so the CAS can access them"? As far as I know, it's just like this:

Consider the scenario when a client uses blake3, and the remote cache uses blake3, but bb-clientd doesn't support it. In that case calls against GetCapabilities() made through bb-clientd will suppress the availability of blake3, causing clients to fall back to conventional algorithms like SHA-256.

@EdSchouten sorry this is confusing, and I'll update it. I guess the main issue is clients that want to use sha256 can't use blobs stored under clients that use blake3, which adds blob duplication. If the server wanted to de-duplicate, they would need to do some sort of translation, and store it somewhere.

@EdSchouten EdSchouten merged commit 14e58f4 into buildbarn:main Dec 31, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants