Skip to content

Commit d55075f

Browse files
committed
Added LRG cache eviction
1 parent 14ef9f6 commit d55075f

File tree

3 files changed

+321
-65
lines changed

3 files changed

+321
-65
lines changed

docs/disk_hnsw_multithreaded_architecture.md

Lines changed: 47 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -50,73 +50,68 @@ number of vectors in the index.
5050

5151
#### Cache Memory Management
5252

53-
**Current Behavior: No Eviction**
53+
**LRU Eviction Policy (Implemented)**
5454

55-
The segment cache (`cacheSegment.cache`) currently **grows unboundedly** and is **never evicted**. Once a node's neighbor list is loaded into cache (either from disk or created during insert), it remains in memory indefinitely.
55+
The segment cache uses an **LRU (Least Recently Used) eviction policy** to bound memory usage. Each segment tracks access order using a doubly-linked list for O(1) operations:
5656

57-
**Why Cache is Source of Truth**
58-
59-
The cache cannot simply be cleared because it serves as the **source of truth** for pending updates that haven't been flushed to disk yet. The Swap-and-Flush pattern relies on:
60-
1. Cache always having the latest neighbor lists
61-
2. `dirty` set tracking which nodes need to be written
62-
3. Flusher reading current cache state (not stale data)
57+
```cpp
58+
struct alignas(64) CacheSegment {
59+
std::shared_mutex guard;
60+
std::unordered_map<uint64_t, std::vector<idType>> cache;
61+
std::unordered_set<uint64_t> dirty;
62+
std::unordered_set<uint64_t> newNodes;
63+
64+
// LRU eviction support
65+
std::list<uint64_t> lruOrder; // Access order (front = MRU)
66+
std::unordered_map<uint64_t, std::list<uint64_t>::iterator> lruMap; // O(1) lookup
67+
68+
void touchLRU(uint64_t key); // Move key to front (most recently used)
69+
void addToLRU(uint64_t key); // Add new key to front
70+
void removeFromLRU(uint64_t key); // Remove key from LRU tracking
71+
uint64_t getLRU() const; // Get least recently used key (back of list)
72+
bool hasLRU() const; // Check if LRU list has entries
73+
};
6374

64-
**Need to decide which strategy to implement (if any).**
65-
**Another option is to not use the neighbors cache at all and always read from disk**
75+
```
6676

67-
**1. LRU Eviction for Clean Entries**
77+
**Configuration API:**
6878

69-
Evict least-recently-used entries that are **not dirty** (already persisted to disk):
7079
```cpp
71-
// Pseudocode
72-
if (cacheSize > maxCacheSize) {
73-
for (auto& entry : lruOrder) {
74-
if (!dirty.contains(entry.key)) {
75-
cache.erase(entry.key);
76-
if (--evicted >= targetEviction) break;
77-
}
78-
}
79-
}
80+
// Set max entries per segment (0 = unlimited). Total max = maxEntries * NUM_CACHE_SEGMENTS
81+
void setCacheMaxEntriesPerSegment(size_t maxEntries);
82+
size_t getCacheMaxEntriesPerSegment() const;
83+
84+
// Get total cache entry count across all segments
85+
size_t getCacheTotalEntryCount() const;
8086
```
81-
*Pros:* Simple, safe (dirty entries always kept)
82-
*Cons:* Requires LRU tracking overhead (linked list + map)
8387
84-
**2. Time-Based Eviction**
88+
**Eviction Rules:**
89+
1. Only **non-dirty entries** can be evicted (dirty entries must be flushed first to avoid data loss)
90+
2. Eviction is triggered when adding new entries and cache is at capacity
91+
3. LRU entries are evicted first (least recently accessed)
92+
4. Total max cache size = `maxCacheEntriesPerSegment * NUM_CACHE_SEGMENTS` (64 segments)
8593
86-
Evict clean entries older than a threshold:
94+
**Example:**
8795
```cpp
88-
// Pseudocode
89-
for (auto& entry : cache) {
90-
if (!dirty.contains(entry.key) &&
91-
now - entry.lastAccessTime > evictionTimeout) {
92-
cache.erase(entry.key);
93-
}
94-
}
96+
// Set limit of 10,000 entries per segment (640,000 total across 64 segments)
97+
index.setCacheMaxEntriesPerSegment(10000);
98+
99+
// Disable limit (unlimited cache)
100+
index.setCacheMaxEntriesPerSegment(0);
95101
```
96-
*Pros:* Predictable memory behavior
97-
*Cons:* Requires timestamp tracking per entry
98102

99-
**3. Write-Through with Immediate Eviction**
103+
**Why Cache is Source of Truth**
100104

101-
After flushing to disk, immediately evict the written entries:
102-
```cpp
103-
// In flushDirtyNodesToDisk(), after successful write:
104-
for (uint64_t key : flushedNodes) {
105-
cacheSegment.cache.erase(key); // Evict after persist
106-
}
107-
```
108-
*Pros:* Minimal memory usage, no tracking overhead
109-
*Cons:* Increases disk reads on subsequent access
105+
The cache serves as the **source of truth** for pending updates that haven't been flushed to disk yet. The Swap-and-Flush pattern relies on:
106+
1. Cache always having the latest neighbor lists
107+
2. `dirty` set tracking which nodes need to be written
108+
3. Flusher reading current cache state (not stale data)
110109

111-
**4. Size-Limited Cache with Eviction Policy**
110+
This is why **dirty entries are never evicted** - they contain updates not yet persisted to disk
111+
112+
**implementation/future optimization**
113+
Memory Overhead: std::list nodes are individually allocated on the heap. For a cache of 1M entries, that is 1M small allocations, which causes heap fragmentation and poor cache locality.
112114

113-
Configure maximum cache size and evict when exceeded:
114-
```cpp
115-
size_t maxCacheEntries = 100000; // Configurable
116-
// On insert, check size and evict clean entries if needed
117-
```
118-
*Pros:* Bounded memory usage
119-
*Cons:* Need to choose appropriate eviction policy
120115

121116
### 3. Lock Hierarchy
122117

src/VecSim/algorithms/hnsw/hnsw_disk.h

Lines changed: 146 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@
5757
#include <shared_mutex>
5858
#include <atomic>
5959
#include <array>
60+
#include <list>
6061

6162
// Forward declaration for AsyncJob
6263
#include "VecSim/vec_sim_tiered_index.h"
@@ -312,6 +313,7 @@ class HNSWDiskIndex : public VecSimIndexAbstract<DataType, DistType>
312313
static constexpr size_t NUM_CACHE_SEGMENTS = 64; // Power of 2 for efficient modulo
313314

314315
// Cache segment structure - each segment is cache-line aligned to prevent false sharing
316+
// Uses LRU eviction policy to limit memory usage
315317
struct alignas(64) CacheSegment {
316318
std::shared_mutex guard;
317319
std::unordered_map<uint64_t, std::vector<idType>> cache;
@@ -320,16 +322,60 @@ class HNSWDiskIndex : public VecSimIndexAbstract<DataType, DistType>
320322
// This helps avoid disk lookups for new nodes
321323
std::unordered_set<uint64_t> newNodes;
322324

325+
// LRU eviction support: doubly-linked list for O(1) access order updates
326+
// Front = most recently used, Back = least recently used
327+
std::list<uint64_t> lruOrder;
328+
// Map from key to iterator in lruOrder for O(1) lookup
329+
std::unordered_map<uint64_t, std::list<uint64_t>::iterator> lruMap;
330+
323331
CacheSegment() = default;
324332
CacheSegment(const CacheSegment&) = delete;
325333
CacheSegment& operator=(const CacheSegment&) = delete;
326334
CacheSegment(CacheSegment&&) = delete;
327335
CacheSegment& operator=(CacheSegment&&) = delete;
336+
337+
// Move key to front of LRU list (most recently used)
338+
void touchLRU(uint64_t key) {
339+
auto it = lruMap.find(key);
340+
if (it != lruMap.end()) {
341+
lruOrder.erase(it->second);
342+
}
343+
lruOrder.push_front(key);
344+
lruMap[key] = lruOrder.begin();
345+
}
346+
347+
// Add new key to LRU list
348+
void addToLRU(uint64_t key) {
349+
lruOrder.push_front(key);
350+
lruMap[key] = lruOrder.begin();
351+
}
352+
353+
// Remove key from LRU list
354+
void removeFromLRU(uint64_t key) {
355+
auto it = lruMap.find(key);
356+
if (it != lruMap.end()) {
357+
lruOrder.erase(it->second);
358+
lruMap.erase(it);
359+
}
360+
}
361+
362+
// Get the least recently used key (for eviction)
363+
uint64_t getLRU() const {
364+
return lruOrder.back();
365+
}
366+
367+
bool hasLRU() const {
368+
return !lruOrder.empty();
369+
}
328370
};
329371

330372
// Array of cache segments - using unique_ptr for lazy initialization
331373
mutable std::unique_ptr<CacheSegment[]> cacheSegments_;
332374

375+
// Maximum entries per cache segment (0 = unlimited)
376+
// Total max cache entries = maxCacheEntriesPerSegment * NUM_CACHE_SEGMENTS
377+
size_t maxCacheEntriesPerSegment_ = 10000;
378+
333379
// Atomic counter for total dirty nodes (for fast threshold check without locking)
334380
mutable std::atomic<size_t> totalDirtyCount_{0};
335381

@@ -618,6 +664,20 @@ class HNSWDiskIndex : public VecSimIndexAbstract<DataType, DistType>
618664
size_t getPendingDiskWriteCount() const { return totalDirtyCount_.load(std::memory_order_relaxed); }
619665
void flushDirtyNodesToDisk(); // Flush all pending dirty nodes to disk
620666

667+
// Cache eviction control
668+
// Set max entries per segment (0 = unlimited). Total max = maxEntries * NUM_CACHE_SEGMENTS
669+
void setCacheMaxEntriesPerSegment(size_t maxEntries) { maxCacheEntriesPerSegment_ = maxEntries; }
670+
size_t getCacheMaxEntriesPerSegment() const { return maxCacheEntriesPerSegment_; }
671+
// Get total cache entry count across all segments
672+
size_t getCacheTotalEntryCount() const {
673+
size_t total = 0;
674+
for (size_t s = 0; s < NUM_CACHE_SEGMENTS; ++s) {
675+
std::shared_lock<std::shared_mutex> lock(cacheSegments_[s].guard);
676+
total += cacheSegments_[s].cache.size();
677+
}
678+
return total;
679+
}
680+
621681
// Job queue configuration (for multi-threaded processing)
622682
void setJobQueue(void *jobQueue_, void *jobQueueCtx_, SubmitCB submitCb_) {
623683
jobQueue = jobQueue_;
@@ -2616,6 +2676,8 @@ void HNSWDiskIndex<DataType, DistType>::getNeighborsFromCache(
26162676
result.push_back(id);
26172677
}
26182678
filterDeletedNodes(result);
2679+
// Note: We don't update LRU here since it's a read-only operation
2680+
// LRU is updated on write/load to maintain const correctness
26192681
return;
26202682
}
26212683

@@ -2641,13 +2703,33 @@ void HNSWDiskIndex<DataType, DistType>::getNeighborsFromCache(
26412703
// Double-check: another thread may have populated it
26422704
auto it = cacheSegment.cache.find(key);
26432705
if (it == cacheSegment.cache.end()) {
2706+
// Eviction: if cache is at capacity, evict LRU entries before adding
2707+
if (maxCacheEntriesPerSegment_ > 0) {
2708+
while (cacheSegment.cache.size() >= maxCacheEntriesPerSegment_ && cacheSegment.hasLRU()) {
2709+
uint64_t evictKey = cacheSegment.getLRU();
2710+
2711+
// Don't evict dirty nodes - they need to be written to disk first
2712+
if (cacheSegment.dirty.find(evictKey) != cacheSegment.dirty.end()) {
2713+
break;
2714+
}
2715+
2716+
cacheSegment.cache.erase(evictKey);
2717+
cacheSegment.removeFromLRU(evictKey);
2718+
cacheSegment.newNodes.erase(evictKey);
2719+
}
2720+
}
2721+
26442722
// Copy from vecsim_stl::vector to std::vector
26452723
std::vector<idType> cacheEntry;
26462724
cacheEntry.reserve(result.size());
26472725
for (idType id : result) {
26482726
cacheEntry.push_back(id);
26492727
}
26502728
cacheSegment.cache[key] = std::move(cacheEntry);
2729+
cacheSegment.addToLRU(key);
2730+
} else {
2731+
// Already populated by another thread - update LRU
2732+
cacheSegment.touchLRU(key);
26512733
}
26522734
}
26532735

@@ -2656,18 +2738,37 @@ void HNSWDiskIndex<DataType, DistType>::getNeighborsFromCache(
26562738

26572739
// Helper to load neighbors from disk into cache if not already present
26582740
// Caller must hold unique_lock on cacheSegment.guard (will be temporarily released during disk I/O)
2741+
// Also handles LRU eviction when cache is at capacity
26592742
template <typename DataType, typename DistType>
26602743
void HNSWDiskIndex<DataType, DistType>::loadNeighborsFromDiskIfNeeded(
26612744
uint64_t key, CacheSegment& cacheSegment, std::unique_lock<std::shared_mutex>& lock) {
26622745

26632746
auto it = cacheSegment.cache.find(key);
26642747
if (it != cacheSegment.cache.end()) {
2665-
return; // Already in cache
2748+
// Already in cache - update LRU and return
2749+
cacheSegment.touchLRU(key);
2750+
return;
26662751
}
26672752

26682753
// Check if this is a new node (never written to disk) - skip disk lookup
26692754
bool isNewNode = (cacheSegment.newNodes.find(key) != cacheSegment.newNodes.end());
26702755

2756+
// Eviction: if cache is at capacity, evict LRU entries before adding
2757+
if (maxCacheEntriesPerSegment_ > 0) {
2758+
while (cacheSegment.cache.size() >= maxCacheEntriesPerSegment_ && cacheSegment.hasLRU()) {
2759+
uint64_t evictKey = cacheSegment.getLRU();
2760+
2761+
// Don't evict dirty nodes - they need to be written to disk first
2762+
if (cacheSegment.dirty.find(evictKey) != cacheSegment.dirty.end()) {
2763+
break;
2764+
}
2765+
2766+
cacheSegment.cache.erase(evictKey);
2767+
cacheSegment.removeFromLRU(evictKey);
2768+
cacheSegment.newNodes.erase(evictKey);
2769+
}
2770+
}
2771+
26712772
if (!isNewNode) {
26722773
// First modification of existing node - need to load current state from disk
26732774
idType nodeId = static_cast<idType>(key >> 32);
@@ -2692,10 +2793,15 @@ void HNSWDiskIndex<DataType, DistType>::loadNeighborsFromDiskIfNeeded(
26922793
cacheEntry.push_back(id);
26932794
}
26942795
cacheSegment.cache[key] = std::move(cacheEntry);
2796+
cacheSegment.addToLRU(key);
2797+
} else {
2798+
// Already populated by another thread - update LRU
2799+
cacheSegment.touchLRU(key);
26952800
}
26962801
} else {
26972802
// New node - initialize with empty neighbor list
26982803
cacheSegment.cache[key] = std::vector<idType>();
2804+
cacheSegment.addToLRU(key);
26992805
}
27002806
}
27012807

@@ -2709,7 +2815,7 @@ void HNSWDiskIndex<DataType, DistType>::addNeighborToCache(
27092815

27102816
std::unique_lock<std::shared_mutex> lock(cacheSegment.guard);
27112817

2712-
// Load from disk if not in cache (handles newNodes check internally)
2818+
// Load from disk if not in cache (handles newNodes check and LRU internally)
27132819
loadNeighborsFromDiskIfNeeded(key, cacheSegment, lock);
27142820

27152821
// Add new neighbor (avoid duplicates)
@@ -2718,6 +2824,9 @@ void HNSWDiskIndex<DataType, DistType>::addNeighborToCache(
27182824
neighbors.push_back(newNeighborId);
27192825
}
27202826

2827+
// Update LRU since we modified this entry
2828+
cacheSegment.touchLRU(key);
2829+
27212830
// Mark as dirty (needs disk write) and increment atomic counter
27222831
auto insertResult = cacheSegment.dirty.insert(key);
27232832
if (insertResult.second) { // Only increment if newly inserted
@@ -2735,7 +2844,7 @@ bool HNSWDiskIndex<DataType, DistType>::tryAddNeighborToCacheIfCapacity(
27352844

27362845
std::unique_lock<std::shared_mutex> lock(cacheSegment.guard);
27372846

2738-
// Load from disk if not in cache (handles newNodes check internally)
2847+
// Load from disk if not in cache (handles newNodes check and LRU internally)
27392848
loadNeighborsFromDiskIfNeeded(key, cacheSegment, lock);
27402849

27412850
// Atomic check-and-add under the lock
@@ -2753,6 +2862,10 @@ bool HNSWDiskIndex<DataType, DistType>::tryAddNeighborToCacheIfCapacity(
27532862

27542863
// Has capacity - add the neighbor
27552864
neighbors.push_back(newNeighborId);
2865+
2866+
// Update LRU since we modified this entry
2867+
cacheSegment.touchLRU(key);
2868+
27562869
auto insertResult = cacheSegment.dirty.insert(key);
27572870
if (insertResult.second) { // Only increment if newly inserted
27582871
totalDirtyCount_.fetch_add(1, std::memory_order_relaxed);
@@ -2776,8 +2889,38 @@ void HNSWDiskIndex<DataType, DistType>::setNeighborsInCache(
27762889
}
27772890

27782891
std::unique_lock<std::shared_mutex> lock(cacheSegment.guard);
2892+
2893+
// Check if key already exists in cache
2894+
bool keyExists = cacheSegment.cache.find(key) != cacheSegment.cache.end();
2895+
2896+
// Eviction: if cache is at capacity and this is a new key, evict LRU entries
2897+
if (maxCacheEntriesPerSegment_ > 0 && !keyExists) {
2898+
while (cacheSegment.cache.size() >= maxCacheEntriesPerSegment_ && cacheSegment.hasLRU()) {
2899+
uint64_t evictKey = cacheSegment.getLRU();
2900+
2901+
// Don't evict dirty nodes - they need to be written to disk first
2902+
if (cacheSegment.dirty.find(evictKey) != cacheSegment.dirty.end()) {
2903+
// All remaining entries are dirty - can't evict without losing data
2904+
break;
2905+
}
2906+
2907+
// Remove from cache and LRU tracking
2908+
cacheSegment.cache.erase(evictKey);
2909+
cacheSegment.removeFromLRU(evictKey);
2910+
// Note: newNodes should have been cleared after flush, but clean up just in case
2911+
cacheSegment.newNodes.erase(evictKey);
2912+
}
2913+
}
2914+
27792915
cacheSegment.cache[key] = std::move(cacheEntry);
27802916

2917+
// Update LRU tracking
2918+
if (keyExists) {
2919+
cacheSegment.touchLRU(key); // Move to front (most recently used)
2920+
} else {
2921+
cacheSegment.addToLRU(key); // Add as most recently used
2922+
}
2923+
27812924
// If this is a new node, track it to avoid disk lookups
27822925
if (isNewNode) {
27832926
cacheSegment.newNodes.insert(key);

0 commit comments

Comments
 (0)