Skip to content

Commit 9593a7c

Browse files
committed
[DOC] add documentation for the OOCEvictionManager design
1 parent d167190 commit 9593a7c

File tree

1 file changed

+50
-49
lines changed

1 file changed

+50
-49
lines changed

src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java

Lines changed: 50 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -46,57 +46,58 @@
4646
import java.util.concurrent.locks.ReentrantLock;
4747

4848
/**
49-
* Eviction Manager for the Out-Of-Core stream cache
50-
* This is the base implementation for LRU, FIFO
49+
* Eviction Manager for the Out-Of-Core (OOC) stream cache.
50+
* <p>
51+
* This manager implements a high-performance, thread-safe buffer pool designed
52+
* to handle intermediate results that exceed available heap memory. It builds on
53+
* a <b>partitioned eviction</b> strategy to maximize disk throughput and a
54+
* <b>lock-striped</b> concurrency model to minimize thread contention.
5155
*
52-
* Purpose
53-
* -------
54-
* Provides a bounded cache for matrix blocks produced and consumed by OOC
55-
* streaming operators. When memory pressure exceeds a configured limit,
56-
* blocks are evicted from memory and spilled to disk, and transparently
57-
* restored on demand.
58-
* </b>
59-
*
60-
* Design scope
61-
* ------------
62-
* - Manages block lifecycle across the states:
63-
* HOT : block in memory
64-
* EVICTING: block spilled to disk
65-
* COLD: block persisted on disk and to be reload when needed
66-
* </b>
67-
* - Guarantees correctness under concurrent get/put operations with:
68-
* * per-block locks
69-
* * explicit eviction state transitions
70-
* </b>
71-
* - Integration with Resettable to support:
72-
* * multiple consumers
73-
* * deterministic replay
74-
* * eviction-safe reuse of shared inputs for tee operator
75-
* </b>
76-
*
77-
* Eviction Strategy
78-
* -----------------
79-
* - Uses FIFO or LRU ordering at block granularity.
80-
* - Eviction is partition-based:
81-
* * blocks are spilled in batches to a single partition file
82-
* * enables high-throughput sequential disk I/O
83-
* - Each evicted block records a (partitionId, offset) for direct reload.
84-
* </b>
56+
* <h3>1. Purpose</h3>
57+
* Provides a bounded cache for {@code MatrixBlock}s produced and consumed by OOC
58+
* streaming operators (e.g., {@code tsmm}, {@code ba+*}). When memory pressure
59+
* exceeds a configured limit, blocks are transparently evicted to disk and restored
60+
* on demand.
8561
*
86-
* Disk Layout
87-
* -----------
88-
* - Spill files are append-only partition files
89-
* - Each partition may contain multiple serialized blocks
90-
* - Metadata remains in-memory while block data can be on disk
91-
* </b>
92-
*
93-
* Concurrency Model
94-
* -----------------
95-
* - Global cache structure guarded by a cache-level lock.
96-
* - Each block has an independent lock and condition variable.
97-
* - Readers wait when a block is in EVICTING state.
98-
* - Disk I/O is performed outside global locks to avoid blocking producers.
99-
* /
62+
* <h3>2. Lifecycle Management</h3>
63+
* Blocks transition atomically through three states to ensure data consistency:
64+
* <ul>
65+
* <li><b>HOT:</b> The block is pinned in the JVM heap ({@code value != null}).</li>
66+
* <li><b>EVICTING:</b> A transition state. The block is currently being written to disk.
67+
* Concurrent readers must wait on the entry's condition variable.</li>
68+
* <li><b>COLD:</b> The block is persisted on disk. The heap reference is nulled out
69+
* to free memory, but the container (metadata) remains in the cache map.</li>
70+
* </ul>
71+
*
72+
* <h3>3. Eviction Strategy (Partitioned I/O)</h3>
73+
* To mitigate I/O thrashing caused by writing thousands of small blocks:
74+
* <ul>
75+
* <li>Eviction is <b>partition-based</b>: Groups of "HOT" blocks are gathered into
76+
* batches (e.g., 64MB) and written sequentially to a single partition file.</li>
77+
* <li>This converts random I/O into high-throughput sequential I/O.</li>
78+
* <li>A separate metadata map tracks the {@code (partitionId, offset)} for every
79+
* evicted block, allowing random-access reloading.</li>
80+
* </ul>
81+
*
82+
* <h3>4. Data Integrity (Re-hydration)</h3>
83+
* To prevent index corruption during serialization/deserialization cycles, this manager
84+
* uses a "re-hydration" model. The {@code IndexedMatrixValue} container is <b>never</b>
85+
* removed from the cache structure. Eviction only nulls the data payload. Loading
86+
* restores the data into the existing container, preserving the original {@code MatrixIndexes}.
87+
*
88+
* <h3>5. Concurrency Model (Lock-Striping)</h3>
89+
* <ul>
90+
* <li><b>Global Lock:</b> A coarse-grained lock guards the cache structure
91+
* (LinkedHashMap) for insertions and deletions.</li>
92+
* <li><b>Per-Block Locks:</b> Each cache entry has an independent {@code ReentrantLock}.
93+
* This allows a reader to load Block A from disk while the evictor writes
94+
* Block B to disk simultaneously.</li>
95+
* <li><b>Wait/Notify:</b> Readers attempting to access a block in the {@code EVICTING}
96+
* state will automatically block until the state transitions to {@code COLD},
97+
* preventing race conditions.</li>
98+
* </ul>
99+
*/
100+
100101
public class OOCEvictionManager {
101102

102103
// Configuration: OOC buffer limit as percentage of heap

0 commit comments

Comments
 (0)