Skip to content

Commit d9f6c6d

Browse files
authored
[DOC] add documentation for the OOCEvictionManager design (#2370)
1 parent 79122eb commit d9f6c6d

File tree

1 file changed

+54
-32
lines changed

1 file changed

+54
-32
lines changed

src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java

Lines changed: 54 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -46,41 +46,63 @@
4646
import java.util.concurrent.locks.ReentrantLock;
4747

4848
/**
49-
* Eviction Manager for the Out-Of-Core stream cache
50-
* This is the base implementation for LRU, FIFO
51-
*
52-
* Design choice 1: Pure JVM-memory cache
53-
* What: Store MatrixBlock objects in a synchronized in-memory cache
54-
* (Map + Deque for LRU/FIFO). Spill to disk by serializing MatrixBlock
55-
* only when evicting.
56-
* Pros: Simple to implement; no off-heap management; easy to debug;
57-
* no serialization race since you serialize only when evicting;
58-
* fast cache hits (direct object access).
59-
* Cons: Heap usage counted roughly via serialized-size estimate — actual
60-
* JVM object overhead not accounted; risk of GC pressure and OOM if
61-
* estimates are off or if many small objects cause fragmentation;
62-
* eviction may be more expensive (serialize on eviction).
63-
* <p>
64-
* Design choice 2:
65-
* <p>
66-
* This manager runtime memory management by caching serialized
67-
* ByteBuffers and spilling them to disk when needed.
49+
* Eviction Manager for the Out-Of-Core (OOC) stream cache.
6850
* <p>
69-
* * core function: Caches ByteBuffers (off-heap/direct) and
70-
* spills them to disk
71-
* * Eviction: Evicts a ByteBuffer by writing its contents to a file
72-
* * Granularity: Evicts one IndexedMatrixValue block at a time
73-
* * Data replay: get() will always return the data either from memory or
74-
* by falling back to the disk
75-
* * Memory: Since the datablocks are off-heap (in ByteBuffer) or disk,
76-
* there won't be OOM.
51+
* This manager implements a high-performance, thread-safe buffer pool designed
52+
* to handle intermediate results that exceed available heap memory. It employs
53+
* a <b>partitioned eviction</b> strategy to maximize disk throughput and a
54+
* <b>lock-striped</b> concurrency model to minimize thread contention.
55+
*
56+
* <h2>1. Purpose</h2>
57+
* Provides a bounded cache for {@code MatrixBlock}s produced and consumed by OOC
58+
* streaming operators (e.g., {@code tsmm}, {@code ba+*}). When memory pressure
59+
* exceeds a configured limit, blocks are transparently evicted to disk and restored
60+
* on demand, allowing execution of operations larger than RAM.
61+
*
62+
* <h2>2. Lifecycle Management</h2>
63+
* Blocks transition atomically through three states to ensure data consistency:
64+
* <ul>
65+
* <li><b>HOT:</b> The block is pinned in the JVM heap ({@code value != null}).</li>
66+
* <li><b>EVICTING:</b> A transition state. The block is currently being written to disk.
67+
* Concurrent readers must wait on the entry's condition variable.</li>
68+
* <li><b>COLD:</b> The block is persisted on disk. The heap reference is nulled out
69+
* to free memory, but the container (metadata) remains in the cache map.</li>
70+
* </ul>
7771
*
78-
* Pros: Avoids heap OOM by keeping large data off-heap; predictable
79-
* memory usage; good for very large blocks.
80-
* Cons: More complex synchronization; need robust off-heap allocator/free;
81-
* must ensure serialization finishes before adding to queue or make evict
82-
* wait on serialization; careful with native memory leaks.
72+
* <h2>3. Eviction Strategy (Partitioned I/O)</h2>
73+
* To mitigate I/O thrashing caused by writing thousands of small blocks:
74+
* <ul>
75+
* <li>Eviction is <b>partition-based</b>: Groups of "HOT" blocks are gathered into
76+
* batches (e.g., 64MB) and written sequentially to a single partition file.</li>
77+
* <li>This converts random I/O into high-throughput sequential I/O.</li>
78+
* <li>A separate metadata map tracks the {@code (partitionId, offset)} for every
79+
* evicted block, allowing random-access reloading.</li>
80+
* </ul>
81+
*
82+
* <h2>4. Data Integrity (Re-hydration)</h2>
83+
* To prevent index corruption during serialization/deserialization cycles, this manager
84+
* uses a "re-hydration" model. The {@code IndexedMatrixValue} container is <b>never</b>
85+
* removed from the cache structure. Eviction only nulls the data payload. Loading
86+
* restores the data into the existing container, preserving the original {@code MatrixIndexes}.
87+
*
88+
* <h2>5. Concurrency Model (Fine-Grained Locking)</h2>
89+
* <ul>
90+
* <li><b>Global Structure Lock:</b> A coarse-grained lock ({@code _cacheLock}) guards
91+
* the {@code LinkedHashMap} structure against concurrent insertions, deletions,
92+
* and iteration during eviction selection.</li>
93+
*
94+
* <li><b>Per-Block Locks:</b> Each {@code BlockEntry} owns an independent
95+
* {@code ReentrantLock}. This decouples I/O operations, allowing a reader to load
96+
* "Block A" from disk while the evictor writes "Block B" to disk simultaneously,
97+
* maximizing throughput.</li>
98+
*
99+
* <li><b>Condition Queues:</b> To handle read-write races, the system uses atomic
100+
* state transitions. If a reader attempts to access a block in the {@code EVICTING}
101+
* state, it waits on the entry's {@code Condition} variable until the writer
102+
* signals that the block is safely {@code COLD} (persisted).</li>
103+
* </ul>
83104
*/
105+
84106
public class OOCEvictionManager {
85107

86108
// Configuration: OOC buffer limit as percentage of heap

0 commit comments

Comments
 (0)