You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/disk_hnsw_multithreaded_architecture.md
+47-52Lines changed: 47 additions & 52 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -50,73 +50,68 @@ number of vectors in the index.
50
50
51
51
#### Cache Memory Management
52
52
53
-
**Current Behavior: No Eviction**
53
+
**LRU Eviction Policy (Implemented)**
54
54
55
-
The segment cache (`cacheSegment.cache`) currently**grows unboundedly** and is **never evicted**. Once a node's neighbor list is loaded into cache (either from disk or created during insert), it remains in memory indefinitely.
55
+
The segment cache uses an**LRU (Least Recently Used) eviction policy** to bound memory usage. Each segment tracks access order using a doubly-linked list for O(1) operations:
56
56
57
-
**Why Cache is Source of Truth**
58
-
59
-
The cache cannot simply be cleared because it serves as the **source of truth** for pending updates that haven't been flushed to disk yet. The Swap-and-Flush pattern relies on:
60
-
1. Cache always having the latest neighbor lists
61
-
2.`dirty` set tracking which nodes need to be written
62
-
3. Flusher reading current cache state (not stale data)
// Get total cache entry count across all segments
85
+
size_t getCacheTotalEntryCount() const;
80
86
```
81
-
*Pros:* Simple, safe (dirty entries always kept)
82
-
*Cons:* Requires LRU tracking overhead (linked list + map)
83
87
84
-
**2. Time-Based Eviction**
88
+
**Eviction Rules:**
89
+
1. Only **non-dirty entries** can be evicted (dirty entries must be flushed first to avoid data loss)
90
+
2. Eviction is triggered when adding new entries and cache is at capacity
91
+
3. LRU entries are evicted first (least recently accessed)
92
+
4. Total max cache size = `maxCacheEntriesPerSegment * NUM_CACHE_SEGMENTS` (64 segments)
85
93
86
-
Evict clean entries older than a threshold:
94
+
**Example:**
87
95
```cpp
88
-
// Pseudocode
89
-
for (auto& entry : cache) {
90
-
if (!dirty.contains(entry.key) &&
91
-
now - entry.lastAccessTime > evictionTimeout) {
92
-
cache.erase(entry.key);
93
-
}
94
-
}
96
+
// Set limit of 10,000 entries per segment (640,000 total across 64 segments)
97
+
index.setCacheMaxEntriesPerSegment(10000);
98
+
99
+
// Disable limit (unlimited cache)
100
+
index.setCacheMaxEntriesPerSegment(0);
95
101
```
96
-
*Pros:* Predictable memory behavior
97
-
*Cons:* Requires timestamp tracking per entry
98
102
99
-
**3. Write-Through with Immediate Eviction**
103
+
**Why Cache is Source of Truth**
100
104
101
-
After flushing to disk, immediately evict the written entries:
102
-
```cpp
103
-
// In flushDirtyNodesToDisk(), after successful write:
104
-
for (uint64_t key : flushedNodes) {
105
-
cacheSegment.cache.erase(key); // Evict after persist
106
-
}
107
-
```
108
-
*Pros:* Minimal memory usage, no tracking overhead
109
-
*Cons:* Increases disk reads on subsequent access
105
+
The cache serves as the **source of truth** for pending updates that haven't been flushed to disk yet. The Swap-and-Flush pattern relies on:
106
+
1. Cache always having the latest neighbor lists
107
+
2.`dirty` set tracking which nodes need to be written
108
+
3. Flusher reading current cache state (not stale data)
110
109
111
-
**4. Size-Limited Cache with Eviction Policy**
110
+
This is why **dirty entries are never evicted** - they contain updates not yet persisted to disk
111
+
112
+
**implementation/future optimization**
113
+
Memory Overhead: std::list nodes are individually allocated on the heap. For a cache of 1M entries, that is 1M small allocations, which causes heap fragmentation and poor cache locality.
112
114
113
-
Configure maximum cache size and evict when exceeded:
114
-
```cpp
115
-
size_t maxCacheEntries = 100000; // Configurable
116
-
// On insert, check size and evict clean entries if needed
117
-
```
118
-
*Pros:* Bounded memory usage
119
-
*Cons:* Need to choose appropriate eviction policy
0 commit comments