feat(flags): enable etag on hypercache writes#57049
Draft
matheus-vb wants to merge 3 commits intomasterfrom
Draft
feat(flags): enable etag on hypercache writes#57049matheus-vb wants to merge 3 commits intomasterfrom
matheus-vb wants to merge 3 commits intomasterfrom
Conversation
Contributor
Query snapshots: Backend query snapshots updatedChanges: 1 snapshots (1 modified, 0 added, 0 deleted) What this means:
Next steps:
|
0460065 to
a210447
Compare
Contributor
Query snapshots: Backend query snapshots updatedChanges: 2 snapshots (2 modified, 0 added, 0 deleted) What this means:
Next steps:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
PR #54793 shipped an in memory
FlagDefinitionsCachekeyed on(team_id, etag), gated by a Redis GET oncache/teams/{team_id}/feature_flags/flags.json:etag. Post deployflags_definitions_inmem_cache_no_version_total{reason="etag_missing"}sustained at ~10 000/s, so ~80% of/flagsrequests bypassed the cache and ran the full payload load (Redis fetch, ZSTD, pickle, JSON, regex compile) on every request. No correctness regression, but the perf upside was almost entirely missing.The Rust hot path calls
get_etagonflags_hypercache_reader(namespacefeature_flags, valueflags.json), which the Django writerflags_hypercacheatposthog/models/feature_flag/flags_cache.py:448populated withoutenable_etag=True. Under that default_set_cache_value_rediswrites the payload and actively deletes any etag key, so the etag was never present for the reader to find. The neighbouringflag_definitions_hypercache(valueflags_with_cohorts.json) already hadenable_etag=Truebut feeds a different Rust reader used by the SDK/flags/definitionsendpoint, not the production/flagshot path.Changes
flags_hypercachenow passesenable_etag=True._set_cache_value_redistakes the etag aware branch and writes payload and etag together viaset_manywith the same TTL on every cache write (signal driven, scheduled refresh, daily mass sync). The__missing__sentinel still deletes the etag so empty teams land on thesentinelreason rather thanetag_missingper the merged Rust loader logic.verify_team_flagsgained a newMISSING_ETAGmismatch branch after the existingMISSING_EVALUATION_METADATAcheck. The dailyverify_flags_cachemanagement command now reportsMISSING_ETAGaggregates so future regressions of this class (someone disablesenable_etag, eviction skew empties the etag, a new hypercache lands without it) cannot recur silently.No Rust changes. The merged PR's
Ok(Some(etag)),Ok(None), andErrbranches are forward compatible. No S3 changes; the Rustget_etagreads only from Redis. No new triggers; etags appear at the existing write cadence and the daily mass sync covers all 363 674 teams within ~24h.How did you test this code?
hogli test posthog/models/feature_flag/test/test_flags_cache.pyruns 177 of 177 green. Four new regression tests cover the etag round trip:test_update_flags_cache_writes_etagconfirmsupdate_flags_cacheproduces a 16 character hex etag.test_clear_flags_cache_clears_etagconfirmsclear_cacheremoves both the payload and the etag.test_missing_sentinel_clears_etagconfirms the__missing__sentinel deletes any prior etag.test_verify_detects_missing_etagprimes a payload without etag state and assertsverify_team_flagsreturnsstatus="mismatch",issue="MISSING_ETAG".hogli test posthog/storage/test/test_hypercache.pyruns 55 of 55 green (storage layer untouched).ruff checkandruff formatpass on both changed files.Production impact verified against current prod metrics. Dedicated flags Redis sees writes double in pipelined SET ops (1 to 2 per write) at peaks of ~22/sec during the daily mass sync window, plus ~31 MB additional storage across 363 674 teams. The Rust read side drops ~95% in payload bandwidth (~72 MB/s to ~4 MB/s) once etags are populated, so net Redis load decreases. No backfill is triggered.
Watch on rollout:
flags_definitions_inmem_cache_no_version_total{reason="etag_missing"}falls from ~10 000/s toward single digits over 24h while{reason="sentinel"}stays flat. Cache hit rate rises from ~165/s toward the/flagsrequest rate. Pyroscope CPU onserde_json::Value::deserialize,serde_pickle::de::parse_value,ZSTD_decompressContinue, andprepare_regexes_in_placecollapses on the feature flags service pods.Publish to changelog?
no