|
46 | 46 | import java.util.concurrent.locks.ReentrantLock; |
47 | 47 |
|
48 | 48 | /** |
49 | | - * Eviction Manager for the Out-Of-Core stream cache |
50 | | - * This is the base implementation for LRU, FIFO |
51 | | - * |
52 | | - * Design choice 1: Pure JVM-memory cache |
53 | | - * What: Store MatrixBlock objects in a synchronized in-memory cache |
54 | | - * (Map + Deque for LRU/FIFO). Spill to disk by serializing MatrixBlock |
55 | | - * only when evicting. |
56 | | - * Pros: Simple to implement; no off-heap management; easy to debug; |
57 | | - * no serialization race since you serialize only when evicting; |
58 | | - * fast cache hits (direct object access). |
59 | | - * Cons: Heap usage counted roughly via serialized-size estimate — actual |
60 | | - * JVM object overhead not accounted; risk of GC pressure and OOM if |
61 | | - * estimates are off or if many small objects cause fragmentation; |
62 | | - * eviction may be more expensive (serialize on eviction). |
63 | | - * <p> |
64 | | - * Design choice 2: |
65 | | - * <p> |
66 | | - * This manager runtime memory management by caching serialized |
67 | | - * ByteBuffers and spilling them to disk when needed. |
| 49 | + * Eviction Manager for the Out-Of-Core (OOC) stream cache. |
68 | 50 | * <p> |
69 | | - * * core function: Caches ByteBuffers (off-heap/direct) and |
70 | | - * spills them to disk |
71 | | - * * Eviction: Evicts a ByteBuffer by writing its contents to a file |
72 | | - * * Granularity: Evicts one IndexedMatrixValue block at a time |
73 | | - * * Data replay: get() will always return the data either from memory or |
74 | | - * by falling back to the disk |
75 | | - * * Memory: Since the datablocks are off-heap (in ByteBuffer) or disk, |
76 | | - * there won't be OOM. |
| 51 | + * This manager implements a high-performance, thread-safe buffer pool designed |
| 52 | + * to handle intermediate results that exceed available heap memory. It employs |
| 53 | + * a <b>partitioned eviction</b> strategy to maximize disk throughput and a |
| 54 | + * <b>lock-striped</b> concurrency model to minimize thread contention. |
| 55 | + * |
| 56 | + * <h2>1. Purpose</h2> |
| 57 | + * Provides a bounded cache for {@code MatrixBlock}s produced and consumed by OOC |
| 58 | + * streaming operators (e.g., {@code tsmm}, {@code ba+*}). When memory pressure |
| 59 | + * exceeds a configured limit, blocks are transparently evicted to disk and restored |
| 60 | + * on demand, allowing execution of operations larger than RAM. |
| 61 | + * |
| 62 | + * <h2>2. Lifecycle Management</h2> |
| 63 | + * Blocks transition atomically through three states to ensure data consistency: |
| 64 | + * <ul> |
| 65 | + * <li><b>HOT:</b> The block is pinned in the JVM heap ({@code value != null}).</li> |
| 66 | + * <li><b>EVICTING:</b> A transition state. The block is currently being written to disk. |
| 67 | + * Concurrent readers must wait on the entry's condition variable.</li> |
| 68 | + * <li><b>COLD:</b> The block is persisted on disk. The heap reference is nulled out |
| 69 | + * to free memory, but the container (metadata) remains in the cache map.</li> |
| 70 | + * </ul> |
77 | 71 | * |
78 | | - * Pros: Avoids heap OOM by keeping large data off-heap; predictable |
79 | | - * memory usage; good for very large blocks. |
80 | | - * Cons: More complex synchronization; need robust off-heap allocator/free; |
81 | | - * must ensure serialization finishes before adding to queue or make evict |
82 | | - * wait on serialization; careful with native memory leaks. |
| 72 | + * <h2>3. Eviction Strategy (Partitioned I/O)</h2> |
| 73 | + * To mitigate I/O thrashing caused by writing thousands of small blocks: |
| 74 | + * <ul> |
| 75 | + * <li>Eviction is <b>partition-based</b>: Groups of "HOT" blocks are gathered into |
| 76 | + * batches (e.g., 64MB) and written sequentially to a single partition file.</li> |
| 77 | + * <li>This converts random I/O into high-throughput sequential I/O.</li> |
| 78 | + * <li>A separate metadata map tracks the {@code (partitionId, offset)} for every |
| 79 | + * evicted block, allowing random-access reloading.</li> |
| 80 | + * </ul> |
| 81 | + * |
| 82 | + * <h2>4. Data Integrity (Re-hydration)</h2> |
| 83 | + * To prevent index corruption during serialization/deserialization cycles, this manager |
| 84 | + * uses a "re-hydration" model. The {@code IndexedMatrixValue} container is <b>never</b> |
| 85 | + * removed from the cache structure. Eviction only nulls the data payload. Loading |
| 86 | + * restores the data into the existing container, preserving the original {@code MatrixIndexes}. |
| 87 | + * |
| 88 | + * <h2>5. Concurrency Model (Fine-Grained Locking)</h2> |
| 89 | + * <ul> |
| 90 | + * <li><b>Global Structure Lock:</b> A coarse-grained lock ({@code _cacheLock}) guards |
| 91 | + * the {@code LinkedHashMap} structure against concurrent insertions, deletions, |
| 92 | + * and iteration during eviction selection.</li> |
| 93 | + * |
| 94 | + * <li><b>Per-Block Locks:</b> Each {@code BlockEntry} owns an independent |
| 95 | + * {@code ReentrantLock}. This decouples I/O operations, allowing a reader to load |
| 96 | + * "Block A" from disk while the evictor writes "Block B" to disk simultaneously, |
| 97 | + * maximizing throughput.</li> |
| 98 | + * |
| 99 | + * <li><b>Condition Queues:</b> To handle read-write races, the system uses atomic |
| 100 | + * state transitions. If a reader attempts to access a block in the {@code EVICTING} |
| 101 | + * state, it waits on the entry's {@code Condition} variable until the writer |
| 102 | + * signals that the block is safely {@code COLD} (persisted).</li> |
| 103 | + * </ul> |
83 | 104 | */ |
| 105 | + |
84 | 106 | public class OOCEvictionManager { |
85 | 107 |
|
86 | 108 | // Configuration: OOC buffer limit as percentage of heap |
|
0 commit comments