OAK-11932: Segment prefetching for CachingSegmentReader by smiroslav · Pull Request #2513 · apache/jackrabbit-oak

smiroslav · 2025-09-16T16:35:42Z

https://issues.apache.org/jira/browse/OAK-11932

smiroslav · 2025-09-18T09:56:11Z

...ache/jackrabbit/oak/segment/spi/persistence/persistentcache/CachingSegmentArchiveReader.java

+        return out;
+    }
+
+    private void schedulePrefetch(long msb, long lsb, Buffer buffer) {


@nfsantos made a good remark that this method, from the start, should be executed in a separate thread context, so it does not impact execution time for the thread invoking CachingSegmentArchiveReader#readSegment

Indeed, my tests with #2519, which implements a similar mechanism, but on a different level of abstraction, showed that dispatching from the caller thread hurts performance.

I even opted to trigger preloading only from within the load-callback, so preloading is completely off the critical path for cache hits.

Of course that's a trade-off, as it requires the caller thread to load one segment before any preloading happens.

jsedding

Looks good! I think some benchmarking is needed to ensure the added overhead is worthwhile.

jsedding · 2025-09-18T15:35:20Z

...ache/jackrabbit/oak/segment/spi/persistence/persistentcache/CachingSegmentArchiveReader.java

+        return out;
+    }
+
+    private void schedulePrefetch(long msb, long lsb, Buffer buffer) {


Indeed, my tests with #2519, which implements a similar mechanism, but on a different level of abstraction, showed that dispatching from the caller thread hurts performance.

I even opted to trigger preloading only from within the load-callback, so preloading is completely off the critical path for cache hits.

Of course that's a trade-off, as it requires the caller thread to load one segment before any preloading happens.

jsedding · 2025-09-18T15:39:09Z

...ache/jackrabbit/oak/segment/spi/persistence/persistentcache/CachingSegmentArchiveReader.java

+                }
+
+                // Drop prefetch if already in progress for this segment
+                boolean registered = inFlightPrefetch.add(ref);


Nice idea! I missed this trick so far in #2519 🙂

nfsantos · 2025-09-19T14:33:49Z

...ache/jackrabbit/oak/segment/spi/persistence/persistentcache/CachingSegmentArchiveReader.java

+            List<UUID> refs = extractReferences(buffer);
+            int limit = Math.min(refs.size(), prefetchMaxRefs);
+            for (int i = 0; i < limit; i++) {


You are getting a list with all the references but then potentially iterate over only a subset of them. You could save some work by extracting only the references that will be prefetched, this could also avoid allocating an array with all the refs. Using streams with limit()may be an easy way of implementing this optimization, as streams are computed lazily.

nfsantos · 2025-09-19T14:40:09Z

...ache/jackrabbit/oak/segment/spi/persistence/persistentcache/CachingSegmentArchiveReader.java

+                if (persistentCache.containsSegment(rMsb, rLsb)) {
+                    continue;
+                }


Maybe it's better to do this check in the worker thread, just before the segment is downloaded? The task that is scheduled to download the segment might not execute for a while, if the worker pool is busy, so from this point until the actual download, the segment might have been added to the cache. Or we can leave the check here and do another one before trying to download.
It would also be better to have a mechanism similar to the Guava loading cache, where thread requesting a segment that is not in the cache but is being downloaded will block waiting for the first download to complete, instead of starting a new download. This would avoid duplicate downloads.

jsedding · 2025-10-09T14:48:29Z

As indicated in #2569, this approach is limited in that a CachingSegmentArchiveReader has only got access to a single archive, and thus cannot read segments from other archives.

Miroslav Smiljanic added 4 commits September 16, 2025 17:01

OAK-11932 - add segment prefetching capability for CachingSegmentReader

8eb86f8

OAK-11932 format

ceab496

OAK-11932 fix the test

24d2970

OAK-11932 fix the test

e7d4c08

reschke changed the title ~~Segment prefetching for CachingSegmentReader~~ OAK-11932: Segment prefetching for CachingSegmentReader Sep 16, 2025

Merge branch 'trunk' into persistent_cache_preload

a659bcf

smiroslav commented Sep 18, 2025

View reviewed changes

jsedding reviewed Sep 18, 2025

View reviewed changes

nfsantos reviewed Sep 19, 2025

View reviewed changes

jsedding mentioned this pull request Oct 9, 2025

OAK-11934 - segment preloading for PersistentCache #2569

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OAK-11932: Segment prefetching for CachingSegmentReader#2513

OAK-11932: Segment prefetching for CachingSegmentReader#2513
smiroslav wants to merge 5 commits intotrunkfrom
persistent_cache_preload

smiroslav commented Sep 16, 2025

Uh oh!

smiroslav Sep 18, 2025

Uh oh!

jsedding Sep 18, 2025

Uh oh!

jsedding left a comment

Uh oh!

jsedding Sep 18, 2025

Uh oh!

jsedding Sep 18, 2025

Uh oh!

nfsantos Sep 19, 2025

Uh oh!

nfsantos Sep 19, 2025

Uh oh!

jsedding commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

smiroslav commented Sep 16, 2025

Uh oh!

smiroslav Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

jsedding Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

jsedding left a comment

Choose a reason for hiding this comment

Uh oh!

jsedding Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

jsedding Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

nfsantos Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

nfsantos Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

jsedding commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants