Skip to content

Conversation

@rorychatterton
Copy link

@rorychatterton rorychatterton commented Dec 23, 2025

Note: I know there isn't an issue raised for this. It ended up being really quick for me to mock out some tests and validate the flow with Claude so if this isn't something you want to integrate directly into the codebase, I won't be upset. That said, I've reviewed it and believe it addresses a gap in the HTTP fetcher path.

Similarly, I'm not 100% sure how you want to approach breaking down work as far as size is concerned. I've tried to keep this as one feature, with tests. Let me know how you go and if you need me to break this down further. I've split it into two commits - one that hardcodes the initial impl, the second adds the configuration to the jsonnet config)

Summary

This PR adds request deduplication for the HTTP fetcher path in bb-remote-asset. When multiple concurrent requests arrive for the same uncached URI, they now share a single upstream fetch instead of triggering duplicate HTTP requests.

Problem

When using the HTTP fetcher, concurrent FetchBlob/FetchDirectory requests for the same uncached URI would all independently miss the cache and trigger separate upstream fetches. This is wasteful and problematic when:

  • The upstream is slow (e.g., third-party caches that build on miss)
  • High concurrency causes unnecessary load on upstream servers
  • Network bandwidth is consumed by duplicate downloads

The remote_execution fetcher path doesn't have this issue because it goes through bb-remote-execution's scheduler, which already deduplicates by action digest. The flow comparison:

Remote Execution Fetcher (already deduplicated):

Client A ─┐
Client B ─┼─→ bb-remote-asset ─→ bb-remote-execution scheduler ─→ single worker fetch
Client C ─┘                      (deduplicates by action digest)

HTTP Fetcher (before this PR):

Client A ─┐
Client B ─┼─→ bb-remote-asset ─┬─→ HTTP GET https://example.com/file.tar.gz
Client C ─┘                    ├─→ HTTP GET https://example.com/file.tar.gz  (duplicate!)
                               └─→ HTTP GET https://example.com/file.tar.gz  (duplicate!)

HTTP Fetcher (after this PR):

Client A ─┐
Client B ─┼─→ bb-remote-asset ─→ CoalescingFetcher ─→ single HTTP GET
Client C ─┘                      (deduplicates by URI + qualifiers)

Solution

Added a CoalescingFetcher decorator that deduplicates in-flight requests using a pure stdlib approach (sync.Mutex + map + channels). This follows the same pattern used in bb-remote-execution's InMemoryBuildQueue scheduler, which uses an inFlightDeduplicationMap to coalesce concurrent requests for the same action.

Fetcher chain order:

CachingFetcher (cache check first - hits return immediately)
  └── CoalescingFetcher (coalesce cache misses)
       └── HTTPFetcher (actual fetch)

Key Changes

  • pkg/fetch/coalescing_fetcher.go - New coalescing fetcher with FetchBlob/FetchDirectory deduplication
  • pkg/configuration/new_fetcher.go - Wire CoalescingFetcher for HTTP fetcher path only
  • tests covering concurrent deduplication, context cancellation, error propagation, and qualifier handling

Test Coverage

  • TestCoalescingFetcherBlobDeduplication - Verifies 10 concurrent requests result in 1 fetch
  • TestCoalescingFetcherDirectoryDeduplication - Same for directory fetches
  • TestCoalescingFetcherDifferentURIsNotCoalesced - Different URIs don't coalesce
  • TestCoalescingFetcherVolatileQualifiersIgnored - Auth headers don't affect coalescing
  • TestCoalescingFetcherDifferentChecksumsNotCoalesced - Different checksum.sri values don't coalesce (consistent with caching)
  • TestCoalescingFetcherDifferentResourceTypeNotCoalesced - Different resource_type values don't coalesce
  • TestCoalescingFetcherContextCancellation - Waiters can bail out if their context is cancelled
  • TestCoalescingFetcherErrorPropagation - Errors propagate to all waiters

Coalescing Key Design

The coalescing key uses the same qualifier filtering as caching via RemoveVolatileQualifiers(). This ensures consistent behavior between what gets cached and what gets coalesced.

Excluded qualifiers (volatile):

  • http_header_url:* - Per-URL HTTP headers (typically auth tokens)
  • bazel.auth_headers - Legacy Bazel authentication headers

Included qualifiers (stable):

  • checksum.sri - Treated as stable, consistent with caching behavior
  • resource_type - Affects how content is processed
  • All other qualifiers

Rationale:
Using the same rules as caching keeps the behavior predictable and avoids subtle inconsistencies between cache keys and coalescing keys.

Notes

  • Coalescing only wraps the HTTP fetcher - RemoteExecution already has deduplication via bb-remote-execution's scheduler
  • If the first caller's context is cancelled or times out, all waiters receive that error and can retry. Inbuilt retries can be tackled later

@rorychatterton rorychatterton marked this pull request as draft December 23, 2025 03:51
When using the HTTP fetcher, concurrent FetchBlob/FetchDirectory requests
for the same uncached URI would trigger duplicate upstream fetches. This
is problematic when the upstream is slow (e.g., third-party caches that
may need to build on miss).

This change adds a CoalescingFetcher decorator that deduplicates in-flight
requests using a pure stdlib approach (sync.Mutex + map + channels),
following the pattern used in bb-remote-execution's scheduler.

Key changes:
- Add coalescing_fetcher.go with FetchBlob/FetchDirectory deduplication
- Export RemoveVolatileQualifiers() from caching_fetcher.go for reuse
- Wire CoalescingFetcher for HTTP fetcher path only
- Add comprehensive tests including concurrent deduplication, context
  cancellation, error propagation, and qualifier handling

The coalescing key uses the same qualifier filtering as caching
(RemoveVolatileQualifiers), ensuring consistent behavior. Volatile
qualifiers like auth headers are excluded from the key.

Note: Coalescing only applies to HTTP fetcher - RemoteExecution already
deduplicates via bb-remote-execution's scheduler. If the first caller's
context is cancelled, all waiters receive that error and can retry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant