Implement DfsBlockCache with Caffeine Cache by jiahuijiang · Pull Request #2 · jiahuijiang/jgit

jiahuijiang · 2017-02-17T02:48:52Z

DO NOT MERGE

Note: This should live in the project that wants to pass this implementation in, for now to make it
easier to review.

Used Caffeine version as default and passed unit tests

jhoch-palantir

Can you walk me through the code tomorrow? Not super easy. Also want to brainstorm a bit about size tracking.

jhoch-palantir · 2017-02-17T03:51:45Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+
+        packFileCache = Caffeine.newBuilder()
+                .removalListener((DfsPackDescription description, DfsPackFile packFile, RemovalCause cause) ->
+                        packFile.close())


this line seems off to me

technically the key and value are @Nullable, though we're not using soft keys/values, but would prefer to be null-safe. may also want to consider logging to track removal with cause & pack file.

actually packFile.close is not needed, will remove it

ben-manes · 2017-02-17T03:53:53Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+        dfsBlockCache.invalidateAll();
+    }
+
+    private static final class DfsPackKeyWithPosition {


Your key doesn't have an equals or hashCode, which could cause problems if you try to lookup the a value with different (but similar) key instances.

hmmm I think two objects of this class with never equal to each other? since pack key has an AtomicLong field.
@ben-manes What do you mean by similar key instance?

A hash map uses the key's equals and hashCode to locate and store an entry. If two equivalent but not equal keys are used, the map will consider them pointing to distinct entries. A cache is built on a hash map.

DfsPackKeyWithPosition key1 = new DfsPackKeyWithPosition(packKey, 100); DfsPackKeyWithPosition key2 = new DfsPackKeyWithPosition(packKey, 100); assert key1.equals(key2) assert key1.hashCode() == key2.hashCode()

ahh thanks for clarification!

fyi, DfsPackKey doesn't implement equals/hashCode either.

Yeah I think it's because of the AtomicLong it contains. So here it has to be the same DfsPackKey object

schlosna · 2017-02-17T16:10:52Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+
+    void cleanUp() {
+        packFileCache.invalidateAll();
+        dfsBlockCache.invalidateAll();


Might want to invoke cleanup on each of these caches as well to free up resources immediately rather than on later cache accesses? See https://github.com/ben-manes/caffeine/wiki/Cleanup for context

ohh! good to know! updated

invalidateAll does a clean-up, since there's nothing remaining in the cache.

schlosna · 2017-02-17T16:11:25Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+                        packFile.close())
+                .maximumSize(cacheEntrySize)
+                .expireAfterAccess(cacheConfig.getPackFileExpireSeconds(), TimeUnit.SECONDS)
+                .recordStats()


if we're recording stats, probably want to expose the recorded stats via an accessor method

Think we want to use tritium to track it but it's not open sourced yet? Will add a TODO to add it when we move it internally.

schlosna · 2017-02-17T16:11:52Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+        dfsBlockCache = Caffeine.newBuilder()
+                .maximumSize(cacheEntrySize)
+                .expireAfterAccess(cacheConfig.getPackFileExpireSeconds(), TimeUnit.SECONDS)
+                .recordStats()


if we're recording stats, probably want to expose the recorded stats via an accessor method

schlosna · 2017-02-17T16:16:31Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+
+        packFileCache = Caffeine.newBuilder()
+                .removalListener((DfsPackDescription description, DfsPackFile packFile, RemovalCause cause) ->
+                        packFile.close())


technically the key and value are @Nullable, though we're not using soft keys/values, but would prefer to be null-safe. may also want to consider logging to track removal with cause & pack file.

schlosna · 2017-02-17T16:17:45Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+     * <p>
+     * The value for blockSize must be a power of 2.
+     */
+    private final int blockSize;


probably want to either check that blockSize is power of 2 or bump it up to next largest power of 2.

Added a check in the config. Will switch to Immutable + checkArgument when we move the code >_<

schlosna · 2017-02-17T16:21:34Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+    private final long maxStreamThroughCache;
+
+    /**
+     * Suggested block size to read from pack files in.


block size in bytes?

schlosna · 2017-02-17T16:24:18Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+        }
+
+        DfsPackFile newPackFile = new DfsPackFile(this, description, key != null ? key : new DfsPackKey());
+        packFileCache.put(description, newPackFile);


does it matter if multiple threads concurrently load the same description -> pack file?

yea.. they may get different results if the packKey is different :/ let me fix it

Don't think this is fixed yet. Consider this sequence of line executions

Thread A: 76
Thread B: 76
Thread A: 77, 81, 82, 83, 76, 77, 78
Thread B: 77, 81, 82, 83, 76, 77, 78

Can we use get(key, mappingFunction) instead? https://github.com/ben-manes/caffeine/blob/master/caffeine/src/main/java/com/github/benmanes/caffeine/cache/Cache.java#L82

jhoch-palantir · 2017-02-21T18:50:17Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+                    // weight is static after creation and update, so here we are relying on dfsBlockCache's removal
+                    // listen to make sure the retained size of packFile won't exceed the given memory
+                    long estimatedSize = 2048 + blockSize;
+                    return estimatedSize > Integer.MAX_VALUE ? Integer.MAX_VALUE : (int) estimatedSize;


If we need to return an int here, let's make it so blockSize has to be <= Integer.MAX_VALUE/2?

jhoch-palantir · 2017-02-21T18:51:20Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+                .build();
+
+        dfsBlockCache = Caffeine.newBuilder()
+                .removalListener((DfsPackKeyWithPosition keyWithPosition, Ref ref, RemovalCause cause) -> ref = null)


what references are we trying to free here? Don't think ref = null does anything if it's already being removed from the cache, right?

jhoch-palantir · 2017-02-21T18:52:38Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+                .removalListener((DfsPackDescription description, DfsPackFile packFile, RemovalCause cause) -> {
+                    if (packFile != null) {
+                        log.debug("PackFile {} is removed because it {}", packFile.getPackName(), cause);
+                        packFile.key.cachedSize.set(0);


Is cachedSize used after the packFile is removed? It feels weird that we're setting it here but not using it in the weigher.

updated with one cache + 2 maps method

jhoch-palantir · 2017-02-21T18:52:55Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+        dfsBlockCache = Caffeine.newBuilder()
+                .removalListener((DfsPackKeyWithPosition keyWithPosition, Ref ref, RemovalCause cause) -> ref = null)
+                .maximumWeight(cacheConfig.getCacheMaximumSize() / 2)
+                .weigher((DfsPackKeyWithPosition keyWithPosition, Ref ref) -> ref == null? 48 : 48 + ref.getSize())


With this and the higher line, 48 and 2048 look like magical constants. Can we make this clearer?

updated with comments

jhoch-palantir · 2017-02-21T18:53:15Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+
+        dfsBlockCache = Caffeine.newBuilder()
+                .removalListener((DfsPackKeyWithPosition keyWithPosition, Ref ref, RemovalCause cause) -> ref = null)
+                .maximumWeight(cacheConfig.getCacheMaximumSize() / 2)


Are the two caches going to be the same size? Seems weird to 50/50 split

jhoch-palantir · 2017-02-21T18:55:35Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+
+    private static final class DfsPackKeyWithPosition {
+        private DfsPackKey dfsPackKey;
+        private long position;


these should be final

(especially if used in hashCode/equals)

jhoch-palantir · 2017-02-21T18:57:13Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+        Ref<DfsBlock> loadedBlockRef = dfsBlockCache.get(new DfsPackKeyWithPosition(key, position), keyWithPosition -> {
+            try {
+                DfsBlock loadedBlock = pack.readOneBlock(keyWithPosition.getPosition(), dfsReader);
+                key.cachedSize.getAndAdd(loadedBlock.size());


Does the weigher get called here? If not is there any way to trigger it?

jhoch-palantir · 2017-02-21T19:01:00Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

@@ -0,0 +1,206 @@
+package org.eclipse.jgit.internal.storage.dfs;
+
+import com.github.benmanes.caffeine.cache.*;


Can we import explicitly?

jhoch-palantir · 2017-02-21T19:02:38Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+        }
+
+        DfsPackFile newPackFile = new DfsPackFile(this, description, key != null ? key : new DfsPackKey());
+        packFileCache.put(description, newPackFile);


Don't think this is fixed yet. Consider this sequence of line executions

Thread A: 76
Thread B: 76
Thread A: 77, 81, 82, 83, 76, 77, 78
Thread B: 77, 81, 82, 83, 76, 77, 78

Can we use get(key, mappingFunction) instead? https://github.com/ben-manes/caffeine/blob/master/caffeine/src/main/java/com/github/benmanes/caffeine/cache/Cache.java#L82

jiahuijiang · 2017-02-22T16:48:06Z

Discussed with @jhoch-palantir offline.
For better size estimation, we are only using Caffeine cache for the key+position -> Ref cache. The weight will be updated every time a ref is inserted/ removed/ updated.
When a ref is removed, we will use a reverse index to find the packFile that this ref belongs to. If the ref points to an index, the whole packFile will be removed. If it's pointing to a file block, the cachedSize will decrease. When the cachedSize becomes zero, the packFile will be removed from our index maps.

DO NOT MERGE Note: This should live in Stemma, put it in jgit for now to make it easier to review.

jhoch-palantir · 2017-02-23T20:23:35Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+        blockSize = cacheConfig.getBlockSize();
+
+        packFileCache = new ConcurrentHashMap<>(16, 0.75f, 1);
+        reversePackDescriptionIndex = new ConcurrentHashMap<>(16, 0.75f, 1);


Think we can use the default constructor

jhoch-palantir · 2017-02-23T20:25:05Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+                // key, value reference             8 * 2 bytes
+                .weigher((DfsPackKeyWithPosition keyWithPosition, Ref ref) -> ref == null? 60 : 60 + ref.getSize())
+                .recordStats()
+                .build();


Let's move the 60 into a constant and move the documentation there.

I really like the formatting on this comment...

I think we should make some effort to account for or at least document the memory taken up by the other two maps, e.g. this cache will take ~<= cacheConfig.getCacheMaximumSize() + X MB

(even if X is in terms of the number of pack files, that's valuable. Then later we can reason about things in terms of pack files, not bytes)

jhoch-palantir · 2017-02-23T20:33:38Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+            DfsPackKey key = keyWithPosition.getDfsPackKey();
+            long position = keyWithPosition.getPosition();
+
+            if (position < 0) {


how can this be less than 0?

that's for indices

jhoch-palantir · 2017-02-23T20:39:56Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+    private final Map<DfsPackDescription, DfsPackFile> packFileCache;
+
+    /** Reverse index from DfsPackKey to the DfsPackDescription. */
+    private final Map<DfsPackKey, DfsPackDescription> reversePackDescriptionIndex;


this is 1-1? do we need any invariant checks for this?

Yep this is 1-1

jhoch-palantir · 2017-02-23T20:42:41Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+    private final int blockSize;
+
+    /** Cache of pack files, indexed by description. */
+    private final Map<DfsPackDescription, DfsPackFile> packFileCache;


29,30s/Cache/Map

jhoch-palantir · 2017-02-23T20:51:54Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+        return blockSize;
+    }
+
+    // do something when the block is invalid


is this outstanding?

out of date. deleting

jhoch-palantir · 2017-02-23T20:52:31Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+            key.cachedSize.set(0);
+        }
+        // TODO: release all the blocks cached for this pack file too
+        // right now those refs are not accessible anymore and will be evicted by caffeine cache eventually


Did you mean to implement this?

don't see this causing a big problem.... but nice to have it as improvement soon

jhoch-palantir · 2017-02-23T20:53:51Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+        return length <= maxStreamThroughCache;
+    }
+
+    DfsPackFile getOrCreate(DfsPackDescription description, DfsPackKey key) {


I feel like there's an edge case where I could just call getOrCreate(...) a bunch of times and never actually load anything into the cache, and the two maps would never get cleared out

These entries should be tiny (<1k) per entry, and if we clear the whole cache object periodically it shouldn't be a problem. But we should still add that as a TODO as least...

jhoch-palantir · 2017-02-23T20:54:49Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+        Ref<DfsBlock> loadedBlockRef = dfsBlockAndIndicesCache.get(new DfsPackKeyWithPosition(key, position), keyWithPosition -> {
+            try {
+                DfsBlock loadedBlock = pack.readOneBlock(keyWithPosition.getPosition(), dfsReader);
+                key.cachedSize.getAndAdd(loadedBlock.size());


do we need to "update" the cache here because the weight has changed?

The cache entry won't get "reload" I believe. here cachedSize is used to keep track whether all the loaded blocks have been evicted.

Let's make sure that this use of "cachedSize" is documented. I think the caching/memory strategy being laid out in a top-level class comment would be sensible. You + I have chatted offline about stuff a bunch and it would be good to make sure that's not lost :D

jhoch-palantir · 2017-02-23T21:02:59Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+            if (keyWithPosition.position >= 0) {
+                keyWithPosition.getDfsPackKey().cachedSize.getAndAdd(size);
+            }
+            return new Ref(keyWithPosition.getDfsPackKey(), keyWithPosition.getPosition(), size, value);


Do we have guarantees that this method is only called if this wasn't present in the map? I'm worried about something getting put in the map twice and cachedSize getting incremented twice (same question with getOrLoad above)

yes this is guaranteed.
(and even if it's not, when the old value gets removed, it's treated as being evicted and the cachedSize will be decreased in the removalListener)

jhoch-palantir · 2017-02-23T21:22:12Z

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsBlockCaffeineCache.java

+        if (pack != null) {
+            DfsPackKey key = pack.key;
+            cleanUpIndicesIfExists(key);
+            key.cachedSize.set(0);


Switch 169 and 170?

jhoch-palantir reviewed Feb 17, 2017

View reviewed changes

ben-manes reviewed Feb 17, 2017

View reviewed changes

schlosna reviewed Feb 17, 2017

View reviewed changes

jiahuijiang force-pushed the jj/caffeine-implementation branch from 8b1132e to 2f2ed77 Compare February 21, 2017 02:40

jhoch-palantir suggested changes Feb 21, 2017

View reviewed changes

jiahuijiang force-pushed the jj/caffeine-implementation branch 4 times, most recently from 8d0ef5a to e642e73 Compare February 22, 2017 19:53

Jiahui Jiang added 4 commits February 22, 2017 14:53

Implement DfsBlockCache with Caffeine Cache

4fd7e84

DO NOT MERGE Note: This should live in Stemma, put it in jgit for now to make it easier to review.

Address comments

4a6e2c0

Attempt of better cache size estimation

8f5550c

Use one caffeine cache and two maps to implement better size estimation

e642e73

jhoch-palantir reviewed Feb 23, 2017

View reviewed changes

address comments

8b3722d

		@@ -0,0 +1,206 @@
		package org.eclipse.jgit.internal.storage.dfs;

		import com.github.benmanes.caffeine.cache.*;

Conversation

jiahuijiang commented Feb 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhoch-palantir left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ben-manes Feb 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jiahuijiang Feb 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jiahuijiang commented Feb 22, 2017

Uh oh!

Choose a reason for hiding this comment

jiahuijiang commented Feb 17, 2017 •

edited

Loading

ben-manes Feb 17, 2017 •

edited

Loading

jiahuijiang Feb 17, 2017 •

edited

Loading