Conversation
PR SummaryLow Risk Overview Written by Cursor Bugbot for commit fc07293. This will update automatically on new commits. Configure here. |
|
|
||
| func (s *DiffStore) Add(d Diff) { | ||
| s.resetDelete(d.CacheKey()) | ||
| s.insertionTimes.Store(d.CacheKey(), time.Now()) |
There was a problem hiding this comment.
Add unconditionally overwrites the insertion time for keys already present in the cache. If Add is called for an existing key (e.g. a re-upload of the same build), the timer resets and the eviction metric will undercount the actual total residence duration.
Consider using LoadOrStore to preserve the original insertion time:
ds.insertionTimes.LoadOrStore(d.CacheKey(), time.Now())|
|
||
| func (s *DiffStore) Add(d Diff) { | ||
| s.resetDelete(d.CacheKey()) |
There was a problem hiding this comment.
🔴 In Add(), insertionTimes.Store is called before cache.Set, so when replacing an existing key the eviction callback for the old item deletes the just-stored insertion time for the new item. This causes the old item to record a near-zero residence duration, and the new item has no insertion time tracked so its eventual eviction silently records no metric at all. Fix: move insertionTimes.Store(d.CacheKey(), time.Now()) to after s.cache.Set(...) returns.
Extended reasoning...
Root Cause
In Add() (cache.go:142-143), the insertion time is stored before the item is placed in the cache:
func (s *DiffStore) Add(d Diff) {
s.resetDelete(d.CacheKey())
s.insertionTimes.Store(d.CacheKey(), time.Now()) // stored first
s.cache.Set(d.CacheKey(), d, ttlcache.DefaultTTL) // eviction fires here
}When cache.Set is called on an existing key, ttlcache fires the OnEviction callback synchronously for the old item before returning. That callback (cache.go:79-82) calls ds.insertionTimes.LoadAndDelete(item.Key()), which finds and removes the just-stored T2 (the NEW item's insertion time) rather than T1 (the old item's time), because they share the same key.
Step-by-Step Proof
- Key
Kis already in the cache with insertion timeT1stored ininsertionTimes. Add()is called again with the same keyK.insertionTimes.Store(K, T2)runs — overwritesT1withT2(now).cache.Set(K, newValue)is called; ttlcache evicts the old item synchronously.- The eviction callback calls
insertionTimes.LoadAndDelete(K), which deletesT2(notT1). - Result: the old item records
time.Since(T2) ~= 0 nsas its residence duration — wildly incorrect. - When the new item eventually evicts,
LoadAndDelete(K)returnsok=false— no metric recorded at all.
Why Existing Code Doesn't Prevent This
The resetDelete(d.CacheKey()) call at the top of Add() explicitly handles the case where the key is already in the cache (scheduled for deletion), confirming the author anticipated replacements. However, the ordering of insertionTimes.Store relative to cache.Set was not considered.
Impact
This is a correctness bug in the exact observability feature this PR introduces. Every time Add() is called with a key that already exists in the cache:
- The old item reports ~0 seconds residence (severely underreported).
- The new item is never tracked, so the
orchestrator.build.cache.residence_durationhistogram silently drops a data point.
Fix
Move insertionTimes.Store to after cache.Set returns:
func (s *DiffStore) Add(d Diff) {
s.resetDelete(d.CacheKey())
s.cache.Set(d.CacheKey(), d, ttlcache.DefaultTTL) // eviction callback for old item runs, deletes T1 correctly
s.insertionTimes.Store(d.CacheKey(), time.Now()) // store AFTER Set, so new insertion time is not clobbered
}With this ordering, when cache.Set evicts the old item, LoadAndDelete(K) finds and removes T1 (the old item's correct insertion time) and records an accurate duration. After Set returns, Store(K, T2) records the new item's insertion time uncontested.
Adds a histogram for duration how long the builds stays in cache