-
-
Notifications
You must be signed in to change notification settings - Fork 355
CacheStore containing source store and cache store #3366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
CacheStore containing source store and cache store #3366
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3366 +/- ##
==========================================
+ Coverage 94.54% 94.60% +0.05%
==========================================
Files 78 79 +1
Lines 9423 9578 +155
==========================================
+ Hits 8909 9061 +152
- Misses 514 517 +3
🚀 New features to boost your workflow:
|
await self._cache.delete(key) | ||
self._remove_from_tracking(key) | ||
|
||
def cache_info(self) -> dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be great to use a typeddict here, so that the return type is known
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I left a few small comments, but a couple bigger picture things:
- This is perhaps the best demonstration for why #2473 (statically associating a Store with a Buffer type) might be helpful. I imagine that we don't want to use precious GPU RAM as a cache.
- Do we care about thread safety here at all? Given that the primary interface to concurrency in Zarr is async, I think we're probably OK not worrying about it. But there might be spots where we can do operations atomically (e.g. using
dict.pop
instead of anif key in dict
followed by the operation) with little cost. Trying to synchronize changes to multiple dictionaries would require much more effort.
except Exception as e: | ||
logger.warning("_evict_key: failed to evict key %s: %s", key, e) | ||
|
||
def _cache_value(self, key: str, value: Any) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we accepting arbitrary value
here? Is it possible to scope this to just Buffer
objects (or maybe Buffer
and NDBuffer
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a glance, it looks like we only call this with Buffer
so hopefully this is an easy fix.
if key in self._cache_order: | ||
del self._cache_order[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this is called from multiple threads, but this could be done atomically with self._cache_order.pop(key, None)
.
|
||
def _cache_value(self, key: str, value: Any) -> None: | ||
"""Cache a value with size tracking.""" | ||
value_size = buffer_size(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only accept Buffer
here, then buffer_size
can be removed hopefully.
>>> cached_store = zarr.storage.CacheStore( | ||
... store=source_store, | ||
... cache_store=cache_store, | ||
... max_size=256*1024*1024 # 256MB cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR. I think it would be better to have the LRU functionality on the cache_store
(in this example the MemoryStore). Otherwise the enclosing CacheStore
would need to keep track of all keys and their access order in the inner store. That could be problematic if the inner store would be shared with other CacheStores or other code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be problematic if the inner store would be shared with other CacheStores or other code.
As long as one of the design goals is to use a regular zarr store as the caching layer, there will be nothing we can do to guard against external access to the cache store. For example, if someone uses a LocalStore
as a cache, we can't protect the local file system from external modification. I think it's the user's responsibility to ensure that they don't use the same cache for separate CacheStores
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but our default behavior could be to create a fresh MemoryStore
, which would be a safe default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern here is about the abstraction. The LRU fits better in the inner store than in the CacheStore, imo. There could even be an LRUStore
that wraps a store and implements the tracking and eviction.
The safety concern is, as you pointed out, something the user should take care of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern here is about the abstraction. The LRU fits better in the inner store than in the CacheStore, imo.
That makes sense, maybe we could implement LRUStore
as another store wrapper?
Closes #3357
Store containing 2 stores i.e. primary store (where the data is coming from) and a cache store (where you want to store the data as a cache).
Introduces the class
CacheStore
This cache wraps any Store implementation and uses a separate Store instance as the cache backend. This provides persistent caching capabilities with time-based expiration, size-based eviction, and flexible cache storage options.
TODO:
docs/user-guide/*.rst
changes/