-
Notifications
You must be signed in to change notification settings - Fork 877
Description
Problem
Every Prebid Server pod fetches the entire Global Vendor List history from vendor-list.consensu.org at startup. As of today, that's ~369 sequential HTTP requests per pod (TCF v2: 223 versions, TCF v3: 146 versions). Each pod builds its own in-memory cache from scratch — nothing is shared across pods, and the upstream cache-control: max-age=604800 (7 days) is not leveraged.
This creates a classic Thundering Herd problem during Kubernetes rollouts where many pods start simultaneously.
Impact — Back-of-Napkin Numbers
| Deployment Size | Pods | GVL Fetches/Pod | Requests per Rollout | Requests/Week (daily deploys) |
|---|---|---|---|---|
| Small (20 pods) | 20 | ~369 | ~7,400 | ~52,000 |
| Medium (50 pods) | 50 | ~369 | ~18,500 | ~130,000 |
| Large (100 pods) | 100 | ~369 | ~36,900 | ~258,000 |
With caching honored (7-day TTL), the same data would require only ~369 requests per week regardless of fleet size — a 99%+ reduction.
Observed Failures
- Bursts of concurrent requests to
consensu.org(fronted by CloudFront) trigger transient failures (HTTP errors, timeouts, rate-limiting). - Even one failure during the init window can leave a pod without a complete GVL, causing GDPR consent processing errors (see: "Cookie syncs may be affected" warnings in logs).
- In Kubernetes, a pod that can't process consent correctly may fail health checks and restart — creating a cascading restart loop that amplifies the herd further.
- Deployment reliability becomes coupled to an external third-party CDN's ability to absorb bursty traffic — a fragile dependency for production rollouts.
The data is highly cacheable
$ curl -I https://vendor-list.consensu.org/v2/vendor-list.json
cache-control: max-age=604800
x-cache: Hit from cloudfront
- Archived versions (e.g.,
v2/archives/vendor-list-v100.json) are immutable — they never change. - Only the "latest" endpoint updates, roughly weekly.
- Yet today, every pod re-fetches all ~369 URLs from the origin on every startup.
What's Needed
A way to cache GVL data once for the cluster so that only the first requester fetches from origin, and all subsequent pods (and restarts) are served from a local, cluster-internal cache.
Technical Context
Current Implementation
In gdpr/vendorlist-fetching.go:
preloadCache()loops over TCF v2 and v3, fetching every archived version sequentially viasaveOne().saveOne()makes a plainhttp.GET— no retry, no backoff, no shared caching.VendorListURLMaker()is hardcoded tohttps://vendor-list.consensu.org/...with no configuration to override the base URL.- The in-memory cache (
sync.Map) is per-process only — lost on restart, not shared.
Proposed Solution: Cooperative Caching via Prebid Cache
Prebid Cache is already deployed as a cluster-wide microservice alongside Prebid Server in most production setups. It has mature storage backends (Redis, Aerospike, Memcache, etc.) and is owned by the Prebid project.
Proposal: Add a GVL caching endpoint to Prebid Cache that:
- Exposes a GVL-compatible URL path (e.g.,
/gvl/v2/vendor-list.json,/gvl/v3/archives/vendor-list-v100.json) that Prebid Server can be configured to use instead ofvendor-list.consensu.org. - Fetches from origin on cache miss, stores in its backend (Redis, etc.), and serves subsequent requests from cache.
- Respects
cache-control/ TTL — archived versions cached indefinitely (immutable); latest version cached for up to 7 days per upstream headers (a library likepquerna/cachecontrolcould help here). - Deduplicates concurrent origin fetches (singleflight pattern) to prevent the cache itself from herding against the origin.
On the Prebid Server side:
- Make the GVL base URL configurable — e.g., a new config parameter
gdpr.vendorlist_base_url(default:https://vendor-list.consensu.org) that can be pointed at the local Prebid Cache instance.
Why Prebid Cache?
- Already deployed: Most PBS operators already run Prebid Cache as a cluster-internal service — no new infrastructure needed.
- Both are Prebid-owned: This is a natural cooperation between two projects in the same ecosystem.
- Proven storage backends: Redis/Aerospike/Memcache already handle TTL-based caching at scale.
- Simple integration: PBS only needs a configurable base URL; all the caching intelligence lives in Prebid Cache.
Result
| Scenario | External Requests to consensu.org |
|---|---|
| Today (100 pods, daily deploys) | ~258,000/week |
| With cluster cache | ~369/week (one cache fill) |
| Reduction | ~99.9% |
This eliminates the Thundering Herd, decouples deployment reliability from a third-party CDN, and is architecturally clean — leveraging infrastructure that already exists in the Prebid ecosystem.
Related Issues
- GDPR Endpoint Logic #504 — Original GVL fetching implementation
- Change GVL URL #1632 — Request to change GVL URL
- Failed to fetch Vendor-List (context deadline exceeded) #1687 — "Failed to fetch Vendor-List (context deadline exceeded)"
Metadata
Metadata
Labels
Type
Projects
Status