-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Chore: Extracted interface Cache from 'LRUCache' #130964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…nder the old name to allow other implementations to be tested. - Found and fixed minor bug where we tested max weight against count() instead of weight - Found and fixed minor issue with possible long overflow. - Adjusted tests but made explicit the ones who rely on the `LRUCache` implementation to test if settings are applied correctly. While CacheBuilder#build() returns a `Cache` it still always returns the `LRUCache` implementation
Updated Javadoc to clarify method behaviors, guarantees, and expectations for implementers of the Cache interface. Introduced default `refresh` method and improved descriptions for traversal methods, weights, and removal notifications for consistency.
…f preventing, very unlikely, overflow
- Refactored CacheTests to LRUCacheTests - Added new CacheTests that test for Cache contract - Added toString() to RemovalNotification
|
Not sure about the context of this change. The code is owned by @elastic/es-core-infra, so I took myself off the reviewers. |
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
…vant classes and tests as it's pulled out into elastic#131784
…vant classes and tests as it's pulled out into elastic#131784
# Conflicts: # server/src/main/java/org/elasticsearch/common/cache/Cache.java
rjernst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The javadoc for Cache notes that the interface is still tightly coupled with LRUCache. I wonder if it would be cleaner to do a simple rename of Cache to LRUCache, then (in a separate change) work on defining a (decoupled) interface, and migrating uses of LRUCache to it as necessary.
Good point, I think I changed my stance on that but in that case I should update the docs, let's park that for a moment; could still be a good idea to slim down the PR. However if we split the refactor with the interface extraction it will mean that all consumers (all places where the class is used) need to be updated. That would mean the PR would touch many more files. Keeping the action together actually reduces the size, in my opinion. So I think I was able to remove the problematic methods as they weren't used in the code base, or only from the class testing the LRUCache. I was also able to finish my alternative implementation and plug it in, so I guess by that simple fact that statement isn't true as far as the Interface goes. The PR looks bigger as it doesn't see the rename separately, but LRUCache is unchanged apart from the class rename. The CacheBuilder always returns a LRUCache as before, I adjusted the tests to reflect this. A followup PR introducing a new implementation should address this or even two separate PR's I would suggest removing that remark if it's ok. Let me check and update the javadoc for the whole interface again. |
removing statement about coupling that is no longer accurate.
|
@rjernst All green again, Let me know if the above is alright |
| * An LRU sequencing of the values in the cache. This sequence is not protected from mutations | ||
| * to the cache (except for {@link Iterator#remove()}. The result of iteration under any other mutation is | ||
| * undefined. | ||
| * An Iterable that allows to transverse all keys in the cache. Modifications might be visible and no guarantee is made on ordering of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keys -> values
| public long weight() { | ||
| return weight; | ||
| } | ||
| long weight(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems implementation specific? The meaning of weight seems opaque, and isn't necessarily representable by a long value. Why does it even need to be retrieved given there is no setter on the interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like with the forEach method, the method exists. Getting rid of it would require changing the call side and expanding the PR.
In it's current use it's implied that this is bytes. sometimes heap, sometimes of heap, sometimes it's only the payload, sometimes it covers the whole entry. Because the weigher is external, the unit is invariant: We ask the weigher for a number for each entry and check if it's more than a max. Could do a lot here but not without changing the call sites and increasing the PR size significantly.
| * An Iterable that allows to transverse all keys in the cache. Modifications might be visible and no guarantee is made on ordering of | ||
| * these modifications as new entries might end up in the already transversed part of the cache. So while transversing; if A and B are | ||
| * added to the cache, and in that order, we might only observe B. Implementations should allow the cache to be modified while | ||
| * transversing but only {@link Iterator#remove()} is guaranteed to not affect further transversal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transversing -> traversing
| * undefined. | ||
| * An Iterable that allows to transverse all keys in the cache. Modifications might be visible and no guarantee is made on ordering of | ||
| * these modifications as new entries might end up in the already transversed part of the cache. So while transversing; if A and B are | ||
| * added to the cache, and in that order, we might only observe B. Implementations should allow the cache to be modified while |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of notes about "implementations should..." in these javadocs. If there are restrictions on implementations, it should be reflected in the API. If implementations want to add restrictions, it should be in their own javadocs. But these notes detract from defining what the API actually guarantees, or doesn't.
As examples, see the javadocs for Collection.iterator() and Map.values():
https://docs.oracle.com/javase/8/docs/api/java/util/Collection.html#iterator--
https://docs.oracle.com/javase/8/docs/api/java/util/Map.html#values--
| private final long hits; | ||
| private final long misses; | ||
| private final long evictions; | ||
| void forEach(BiConsumer<Key, Value> consumer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this in the existing API? Why is it necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I didn't introduce any new methods: all are extracted from the current implementation and then filtered for use in the codebase.
I think I have the same feels, but I didn't want to increase the size of the PR further by adjusting the call side. This would be something for a followup PR, to unify and have a single style to iterate over data inside the cache,
| public class CacheTests extends ESTestCase { | ||
| private int numberOfEntries; | ||
|
|
||
| private List<CacheSupplier<String, String>> knownImplementations; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests seem to still be testing the LRU implementation. Presumably the point should be to test the interface and any default methods. I think a mock implementation in this test suite would better test that.
| public class CacheTests extends ESTestCase { | ||
| private int numberOfEntries; | ||
|
|
||
| private List<CacheSupplier<String, String>> knownImplementations; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests seem to still be testing the LRU implementation. Presumably the point should be to test the interface and any default methods. I think a mock implementation in this test suite would better test that.
Chore: Extracting a Cache interface to allow exploring alternative implementations. Alternative implementations are outside the scope of this PR.
I know indirection is bad for performance, but the
Cache/LRUCachealready lists trade-offs, I expect things to get inlined quickly enough during runtime, so everything considered, I don't think this will 'do us in'.Along the way found some minor bugs and issues
LRUCacheimplementation to test if settings are applied correctly. (While CacheBuilder#build() returns aCacheit still always returns theLRUCacheimplementation)Cache.CacheStats/Cache.Statsa record. Getters we're kept for now, should perhaps inline access in the future.RemovalListenerand how the interface doesn't mandate how, how many, and what the effects are if registered listener is swapped or if multiple listeners can be registered and what happens when they are added or removed.Context
The Cache/LRUCache is the only implementation of a cache within the ES code base, the implementation dates from 2015 when different priorities were being juggled. Being the only implementation part of the codebase, it's usage has grown over the years, for example: Enrich, Interference, Searchable snapshots. While the choices at the time were sound, as the external factors have changed, it makes sense to revisit these. As a first step PR which looks to extract an Interface and formalize the contract between implementation and consumer.
Somewhat complete list of use cases that use the cache:
NOTES (ramblings of the author, for context, can be ignored):
The wording of javadoc of the old
Cache/LRUCachemight have sounded 'overly scary/dangerous' leading to some interesting usage/interactionAs already pointed out in comments org.elasticsearch.xpack.core.security.support.CacheIteratorHelper, it's a little wonk. I'm also unsure if the extra protection is needed. Regardless the design of that class to leak the locks and not pull up interaction and fully decorate a cache a la Collections.unmodifiableXXX(XXX) is... 'not (personally) preferred'. But this would be a separate PR.
A computed weight is being tracked as long, which can overflow. Always calculating it on the fly or calculating it on the fly after detecting overflow is an option. Also wondering about thread safety as we're incrementing and something like an AtomicLong is needed. 'Best-effort' :-) Changing the LRUCache impl is out of scope of this PR, so left alone.
For the reviewer:
old
Cachewas refactored toLRUCache, please diff these, there shouldn't be any changes beyond the name and pulling the shared inner classes out. Github shows a diff between the oldCacheand the interface that is now created at the old coordinate. It seems that context was lost which makes it appear like a much bigger PR than it is.Similar with CacheTests and LRUCacheTests. (CacheTests is completely new)