You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The read operation follows this flow (inspired by SlateDB's design):
163
+
The read operation follows this flow:
164
164
165
165
1.**Check chunked mode**: If `chunk_size_bytes` is `None`, fallback to whole-object caching (current implementation).
166
166
167
-
2.**Prefetch with aligned range** (key optimization):
167
+
2.**Prefetch with aligned range**:
168
168
```rust
169
169
asyncfnmaybe_prefetch_range(
170
170
&self,
@@ -227,8 +227,6 @@ The read operation follows this flow (inspired by SlateDB's design):
227
227
228
228
**Why alignment matters**: When object is not yet cached, aligning the range allows us to fetch complete chunks in a single request. For example, if user requests bytes 100MB-150MB with 64MB chunks, we fetch 64MB-192MB in one request and save chunks 1 and 2. Future reads to any part of chunks 1 or 2 will hit cache.
229
229
230
-
**Version handling**: The version (etag) is obtained from the read response and included in all cache keys. This ensures that when an object is updated (etag changes), old cached chunks won't be used.
231
-
232
230
3.**Split range into chunks**:
233
231
```rust
234
232
fnsplit_range_into_chunks(
@@ -300,6 +298,7 @@ The read operation follows this flow (inspired by SlateDB's design):
// Note: This requires buffering the write payload, which may not be desirable
331
-
// For now, we can skip caching write data and only cache on subsequent reads
332
-
}
333
-
}
313
+
1.**Range alignment strategy**
334
314
335
-
Ok(result)
336
-
}
337
-
```
315
+
When metadata is not yet cached, the implementation aligns the requested range to chunk boundaries before fetching from the underlying storage. For example, if a user requests bytes 100-150MB with 64MB chunks configured, the system will fetch the aligned range of 64-192MB.
338
316
339
-
**Write caching strategy**:
340
-
-**Simple approach**: Only invalidate metadata, don't cache write data
341
-
- Remove `{path}#meta` from cache
342
-
- Let chunks naturally expire via LRU
343
-
- Subsequent reads will populate cache
344
-
-**Aggressive approach**: Cache written data if enabled
345
-
- Useful for write-then-read patterns
346
-
- Requires access to write payload (may need buffering)
347
-
- Can be controlled via `cache_writes` flag (similar to SlateDB)
317
+
While this fetches more data initially, it significantly reduces the number of requests to the underlying storage by consolidating multiple chunk fetches into a single aligned request. This trade-off proves beneficial as it populates the cache more efficiently and reduces overall latency.
348
318
349
-
### Delete Operation Implementation
319
+
The alignment is only applied on the first fetch (cache miss). Subsequent reads can directly use the cached chunks without additional alignment overhead.
The implementation returns data as a stream where each chunk is fetched lazily when consumed.
356
324
357
-
// Best-effort cache invalidation
358
-
// Remove metadata (chunks will be evicted naturally)
359
-
letmeta_key=format!("{}#meta", path);
360
-
self.cache.remove(&meta_key).await.ok();
325
+
This approach is critical for memory efficiency when reading large ranges that span many chunks. Without streaming, reading a multi-gigabyte range would require loading all chunks into memory simultaneously, potentially exhausting available memory and causing performance degradation.
361
326
362
-
// Optionally: If metadata is in cache, calculate and remove all chunks
363
-
// This is more thorough but requires additional cache lookup
327
+
3.**Best-effort cache operations**
364
328
365
-
Ok(result)
366
-
}
367
-
```
329
+
All cache operations (insert, remove, get) are designed to never fail the user's read or write operation.
368
330
369
-
**Rationale**: Lazy chunk removal is acceptable because:
370
-
- Cached chunks for deleted objects are harmless (worst case: wasted cache space)
371
-
- They'll be evicted naturally by LRU when cache pressure increases
372
-
- Scanning cache for all chunks is expensive and not worth the cost
373
-
374
-
### Key Design Decisions
375
-
376
-
**Range alignment strategy**:
377
-
- When metadata is not cached, align the requested range to chunk boundaries before fetching
-**Trade-off**: Fetches more data initially, but populates cache more efficiently
380
-
-**Benefit**: Reduces number of requests to underlying storage (one aligned request vs. multiple chunk requests)
381
-
- Only apply alignment on first fetch (cache miss); subsequent reads use cached chunks
382
-
383
-
**Streaming instead of buffering**:
384
-
- Return data as a stream rather than loading all chunks into memory
385
-
- Each chunk is fetched lazily when consumed
386
-
- Matches OpenDAL's streaming API design
387
-
- Critical for memory efficiency when reading large ranges
388
-
389
-
**Chunk size validation**:
390
-
- Require chunk size to be aligned to 1KB (similar to SlateDB)
391
-
- Prevents edge cases with very small or misaligned chunks
392
-
- Recommended range: 16MB - 128MB
393
-
394
-
**Cache operation error handling**:
395
-
- All cache operations (insert, remove, get) should be best-effort
396
-
- Cache failures should NOT fail the user's read/write operation
397
-
- Log warnings for cache errors but continue with fallback to underlying storage
398
-
- This ensures cache is truly transparent to users
331
+
If a cache operation encounters an error, the implementation logs a warning and continues by falling back to the underlying storage. This ensures that the cache layer remains truly transparent to users.
399
332
400
333
### Edge Cases and Considerations
401
334
402
-
**Last chunk handling**:
403
-
- The last chunk may be smaller than `chunk_size_bytes`
- End beyond object size: Clamp end to object size
412
-
413
-
**Concurrent access**:
414
-
- Foyer's built-in request deduplication handles concurrent reads to the same chunk
415
-
- Multiple concurrent reads to chunk N will result in only one fetch from underlying storage
416
-
- Other readers wait and reuse the result
417
-
- No additional locking needed in FoyerLayer
418
-
419
-
**Cache consistency**:
420
-
- Cache follows eventual consistency model (same as OpenDAL)
421
-
- No distributed coordination for concurrent writes from different processes
422
-
- Cache invalidation on write/delete is best-effort
423
-
- Acceptable for object storage workloads (most are read-heavy, immutable objects)
424
-
425
-
### Performance Characteristics
426
-
427
-
**Benefits of aligned prefetching**:
428
-
-**Fewer requests**: One aligned request instead of N chunk requests on cache miss
429
-
- Example: Request 100-150MB → 1 aligned fetch (64-192MB) vs. 2 separate chunk fetches
430
-
-**Better locality**: Neighboring chunks are likely to be accessed together
431
-
-**Reduced latency**: Fewer round-trips to underlying storage
432
-
433
-
**Memory efficiency**:
434
-
- Metadata overhead: ~100-200 bytes per object
435
-
- Chunk data follows normal LRU eviction
436
-
- Streaming API avoids buffering large ranges in memory
437
-
- Each chunk is independently evictable
438
-
439
-
**Cache hit rate analysis**:
440
-
-**Partial reads**: Significantly improved hit rate
441
-
- Chunks are smaller units, higher reuse probability
442
-
- Example: Reading different columns of a Parquet file reuses row group chunks
443
-
-**Whole-object reads**: Slightly lower hit rate due to fragmentation
444
-
- Requires all chunks to be cached vs. one whole-object entry
445
-
- Trade-off is acceptable given target workload (partial reads)
335
+
**Last chunk handling**
336
+
337
+
The last chunk of an object may be smaller than the configured chunk size and requires special attention. The implementation calculates the actual chunk size using the formula `min((chunk_idx + 1) * chunk_size, object_size) - chunk_idx * chunk_size`.
338
+
339
+
For example, a 200 MB file with 64 MB chunks would be split into chunks 0, 1, and 2 of 64MB each, followed by chunk 3 containing only 8MB.
340
+
341
+
**Empty or invalid range requests**
342
+
343
+
Range requests are handled according to OpenDAL's existing semantics:
344
+
- Empty range: Returns empty result without performing any cache operations
345
+
- Range start beyond object size: Returns error to match OpenDAL's behavior
346
+
- Range end exceeds object size: Clamped to the actual object size, allowing partial reads near the end of objects
347
+
348
+
**Concurrent access**
349
+
350
+
Concurrent access patterns benefit from Foyer's built-in request deduplication mechanism. When multiple concurrent reads request the same chunk, Foyer ensures that only one fetch actually occurs from the underlying storage, while other readers wait and reuse the result.
351
+
352
+
This deduplication happens transparently within the Foyer cache layer, requiring no additional locking or coordination logic in FoyerLayer itself.
353
+
354
+
**Cache consistency**
355
+
356
+
The cache follows an eventual consistency model aligned with OpenDAL's consistency guarantees. There is no distributed coordination for concurrent writes from different processes, and cache invalidation on write or delete operations is performed on a best-effort basis.
357
+
358
+
This relaxed consistency model is acceptable for typical object storage workloads, which are predominantly read-heavy and often involve immutable objects.
446
359
447
360
### Testing Strategy
448
361
449
-
**Unit tests**:
450
-
-`split_range_into_chunks` with various ranges and object sizes
-**Opt-in**: Users explicitly enable chunked mode via configuration
473
-
-**Cache format change**: Whole-object cache and chunked cache use different key formats
474
-
- No automatic migration needed (cache rebuilds naturally)
475
-
- Changing chunk size also invalidates cache (keys change)
476
-
- This is acceptable since cache is ephemeral
389
+
**Backward compatibility**
390
+
391
+
The chunked cache feature is fully backward compatible with existing FoyerLayer usage. The implementation defaults to `chunk_size_bytes = None`, which activates whole-object mode matching the current behavior. This means existing users are completely unaffected by the introduction of chunked caching.
392
+
393
+
**Opt-in design**
394
+
395
+
Chunked cache is an opt-in feature that users must explicitly enable through configuration by setting the chunk size. This conservative approach ensures that users who haven't evaluated whether chunked caching benefits their workload will continue to use the proven whole-object caching strategy.
396
+
397
+
**Cache key migration**
398
+
399
+
The cache key format changes between whole-object and chunked modes, but this requires no special migration handling. Since whole-object cache uses different keys than chunked cache, and different chunk sizes use different keys from each other, old cache entries simply coexist harmlessly with new ones.
400
+
401
+
As the LRU eviction policy runs, old entries naturally expire and are replaced with new entries in the current format. This natural invalidation is acceptable because the cache is ephemeral by design, storing temporary performance-optimization data rather than durable state.
0 commit comments