CacheStore containing source store and cache store #3366

ruaridhg · 2025-08-11T13:12:15Z

Store containing 2 stores i.e. primary store (where the data is coming from) and a cache store (where you want to store the data as a cache).

Introduces the class CacheStore

This cache wraps any Store implementation and uses a separate Store instance as the cache backend. This provides persistent caching capabilities with time-based expiration, size-based eviction, and flexible cache storage options.

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/user-guide/*.rst
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

codecov · 2025-08-11T13:44:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.86%. Comparing base (1cd4f7e) to head (af81c17).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3366      +/-   ##
==========================================
+ Coverage   61.25%   61.86%   +0.61%     
==========================================
  Files          84       85       +1     
  Lines        9949    10110     +161     
==========================================
+ Hits         6094     6255     +161     
  Misses       3855     3855

Files with missing lines	Coverage Δ
src/zarr/experimental/cache_store.py	`100.00% <100.00%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

d-v-b · 2025-08-11T14:47:01Z

src/zarr/experimental/cache_store.py

+        await self._cache.delete(key)
+        self._remove_from_tracking(key)
+
+    def cache_info(self) -> dict[str, Any]:


it would be great to use a typeddict here, so that the return type is known

TomAugspurger

Thanks for the PR. I left a few small comments, but a couple bigger picture things:

This is perhaps the best demonstration for why #2473 (statically associating a Store with a Buffer type) might be helpful. I imagine that we don't want to use precious GPU RAM as a cache.
Do we care about thread safety here at all? Given that the primary interface to concurrency in Zarr is async, I think we're probably OK not worrying about it. But there might be spots where we can do operations atomically (e.g. using dict.pop instead of an if key in dict followed by the operation) with little cost. Trying to synchronize changes to multiple dictionaries would require much more effort.

TomAugspurger · 2025-08-11T14:53:31Z

src/zarr/storage/_caching_store.py

+        except Exception as e:
+            logger.warning("_evict_key: failed to evict key %s: %s", key, e)
+
+    def _cache_value(self, key: str, value: Any) -> None:


Why are we accepting arbitrary value here? Is it possible to scope this to just Buffer objects (or maybe Buffer and NDBuffer)?

At a glance, it looks like we only call this with Buffer so hopefully this is an easy fix.

TomAugspurger · 2025-08-11T14:57:59Z

src/zarr/experimental/cache_store.py

+        if key in self._cache_order:
+            del self._cache_order[key]


I don't know if this is called from multiple threads, but this could be done atomically with self._cache_order.pop(key, None).

TomAugspurger · 2025-08-11T16:46:33Z

src/zarr/experimental/cache_store.py

+
+    def _cache_value(self, key: str, value: Any) -> None:
+        """Cache a value with size tracking."""
+        value_size = buffer_size(value)


If we only accept Buffer here, then buffer_size can be removed hopefully.

buffer_size is now removed, as we are using the Buffer type

normanrz · 2025-08-13T12:39:19Z

docs/user-guide/cachingstore.rst

+   >>> cached_store = zarr.storage.CacheStore(
+   ...     store=source_store,
+   ...     cache_store=cache_store,
+   ...     max_size=256*1024*1024  # 256MB cache


Thanks for this PR. I think it would be better to have the LRU functionality on the cache_store (in this example the MemoryStore). Otherwise the enclosing CacheStore would need to keep track of all keys and their access order in the inner store. That could be problematic if the inner store would be shared with other CacheStores or other code.

That could be problematic if the inner store would be shared with other CacheStores or other code.

As long as one of the design goals is to use a regular zarr store as the caching layer, there will be nothing we can do to guard against external access to the cache store. For example, if someone uses a LocalStore as a cache, we can't protect the local file system from external modification. I think it's the user's responsibility to ensure that they don't use the same cache for separate CacheStores.

but our default behavior could be to create a fresh MemoryStore, which would be a safe default

My main concern here is about the abstraction. The LRU fits better in the inner store than in the CacheStore, imo. There could even be an LRUStore that wraps a store and implements the tracking and eviction.
The safety concern is, as you pointed out, something the user should take care of.

My main concern here is about the abstraction. The LRU fits better in the inner store than in the CacheStore, imo.

That makes sense, maybe we could implement LRUStore as another store wrapper?

I think it would be better to have the LRU functionality on the cache_store (in this example the MemoryStore)

To understand, is the suggestion here that

There's no CacheStore class

Instead all the logic for caching is implemented on Store, and there is a cache_store property that can be set to a second store to enable caching?

@normanrz unless you have a concrete proposal for a refactor that would be workable in the scope of this PR, I would suggest we move forward with the PR as-is, and use experience to dial in the abstraction in a later PR.

But knowing that design here might change, I think we should introduce an "experimental" storage module, e.g. zarr.storage.experimental, or a top-level experimental module, zarr.experimental, and put this class there until we are sure that the API is final.

Thoughts? I would like to ship this important feature while also retaining the ability to safely adjust it later. An experimental module seems like a safe way to do that.

Just commenting that I'd love to have this included sooner rather than later, it will be immediately useful 🎉 Thanks @ruaridhg for taking the initiative and putting this together!

Nice use of experimental here! 🤩

…into rmg/cache_remote_stores_locally

…gic, and avoid calling exists unnecessarily

…t have doctests working)

d-v-b · 2025-10-01T16:26:38Z

@ruaridhg I pushed a bunch of changes, please have a look before I merge!

…_experimental

d-v-b

This is missing some doctests, but that's because we don't have doctests working (see #3500). I'm going to merge this as-is, and we can circle back with more fixes later. Thanks for this contribution @ruaridhg!

ruaridhg · 2025-10-06T08:50:33Z

This is missing some doctests, but that's because we don't have doctests working (see #3500). I'm going to merge this as-is, and we can circle back with more fixes later. Thanks for this contribution @ruaridhg!

@d-v-b Thanks for making changes and merging. I'm on another project with a tight deadline so haven't had time to look at this, much appreciated!

ruaridhg added 27 commits August 7, 2025 14:27

Add _cache.py first attempt

abb764e

test.py ran without error, creating test.zarr/

d72078f

Added testing for cache.py LRUStoreCache for v3

e1266b4

Fix ruff errors

40e6f46

Add working example comparing LocalStore to LRUStoreCache

eadc7bb

Delete test.py to clean-up

5f90a71

Added lrustorecache to changes and user-guide docs

ae51d23

Fix linting issues

e58329a

Implement dual store cache

26bd3fc

Fixed failing tests

5c92d48

Fix linting errors

f0c302c

Add logger info

11f17d6

Delete unnecessary extra functionality

a7810dc

Rename to caching_store

a607ce0

Add test_storage.py

8e79e3e

Fix logic in _caching_store.py

d31e565

Update tests to match caching_store implemtation

92cd63c

Delete LRUStoreCache files

aa38def

Update __init__

86dda09

Add functionality for max_size

bb807d0

Add tests for cache_info and clear_cache

ed4b284

Delete test.py

0fe580b

Fix linting errors

1d9a1f7

Update feature description

16ae3bd

Fix errors

62b739f

Fix cachingstore.rst errors

f51fdb8

Fix cachingstore.rst errors

ffa9822

ruaridhg and others added 2 commits August 11, 2025 14:47

Merge branch 'main' into rmg/cache_remote_stores_locally

cda4767

Fixed eviction key logic with proper size tracking

d20843a

ruaridhg added 2 commits August 11, 2025 15:01

Increase code coverage to 98%

4b8d0a6

Fix linting errors

84a87e2

d-v-b reviewed Aug 11, 2025

View reviewed changes

TomAugspurger reviewed Aug 11, 2025

View reviewed changes

normanrz reviewed Aug 13, 2025

View reviewed changes

d-v-b added 3 commits August 21, 2025 16:51

Merge branch 'main' into rmg/cache_remote_stores_locally

f3b6b3e

Merge branch 'main' into rmg/cache_remote_stores_locally

114a29a

Merge branch 'main' into rmg/cache_remote_stores_locally

39cb6b1

d-v-b mentioned this pull request Sep 23, 2025

add an experimental module #3484

Closed

Merge branch 'main' of https://github.com/zarr-developers/zarr-python …

1f200ed

…into rmg/cache_remote_stores_locally

github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Sep 30, 2025

d-v-b added 2 commits September 30, 2025 23:15

move cache store to experimental, fix bugs

f9c8c09

update changelog

6861490

github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Sep 30, 2025

d-v-b added 4 commits October 1, 2025 12:03

remove logging config override, remove dead code, adjust evict_key lo…

41d182c

…gic, and avoid calling exists unnecessarily

add docs

83539d3

add tests for relaxed cache coherency

56db161

adjust code examples (but we don't know if they work, because we don'…

3d21514

…t have doctests working)

d-v-b mentioned this pull request Oct 2, 2025

chore/doctests #3500

Merged

d-v-b added 3 commits October 2, 2025 10:31

apply changes based on AI code review, and move tests into tests/test…

923cc53

…_experimental

update changelog

e319ce3

fix exception log

af81c17

d-v-b approved these changes Oct 2, 2025

View reviewed changes

d-v-b enabled auto-merge (squash) October 2, 2025 08:46

d-v-b merged commit 2eb89e1 into zarr-developers:main Oct 2, 2025
29 checks passed

Uh oh!

CacheStore containing source store and cache store #3366

CacheStore containing source store and cache store #3366

Uh oh!

Conversation

ruaridhg commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d-v-b commented Oct 1, 2025

Uh oh!

d-v-b left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ruaridhg commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ruaridhg commented Aug 11, 2025 •

edited

Loading

codecov bot commented Aug 11, 2025 •

edited

Loading