Persisting metadata cache

In the context of BigStitcher, we have the problem that opening a container with many (thousands) datasets (representing tiles) in BigDataViewer takes a long time. The reason is, that we need to load the metadata for all tiles to know bounding boxes etc. This is before loading any actual data. Data loading is asynchronous, but loading the metadata is not. We need the metadata  (dataset dimensions and DataType) to create the lazy CachedCellImgs so that we even can decide which data should be lazy-loaded asynchronously.

N5 has `N5JsonCache` to cache metadata (attributes, group lists, etc). After the cache is populated, querying dataset dimensions is very fast, because we don't need any reads from the backend. But initially, to populate the cache, *a lot* of backend operations need to happen (reading attributes.json files, listing group contents).

To speed up this initial loading in BDV, I created this hack: https://github.com/bigdataviewer/bigdataviewer-core/pull/213
The idea is to put a `attributes_cache.json` file in the root of the container that collects all `attributes.json` (currently, should be extended to `.zgroup`, `.zattr`, etc) into a single file. This is basically a dump of a fully populated `N5JsonCache`. If that file exists when opening a container, I read it and populate it into the `N5JsonCache`. This is very fast. The file can be a several MB but reading, parsing, and injecting into the `N5JsonCache` still only takes a few milliseconds.

Do we want to build this out and make this more standard?

The `attributes_cache.json` is organized as a tree of json objects corresponding to the directory structure of the container.
Every object has a `children` property that lists all subgroups, a `isDataset` boolean property, a `isGroup` property, and each of the attributes files as JSON objects (e.g., one child named `attributes.json` containing the JsonObject of the parsed `attributes.json` file. That is, this is exactly the information needed to populate a `N5JsonCache.N5CacheInfo` entry.

Where it differs from the `N5JsonCache` structure is that it is actually a nested tree structure. The path to the `setup0/timepoint0/s0/attributes.json` content would be under the JSON path `"/children/setup0/children/timepoint0/children/s0/attributes.json"`.
The nice thing about this is that we could put `attributes_cache.json` not only in the container root but also into subgroups, where they would collect everything under that subgroup (with paths relative to the subgroup). This could be used to quickly rebuild the root `attributes_cache` when something changes. To collect all information, we would only have to descend into subgroups until a `attributes_cache.json` is found and then that could be included completely and we wouldn't need to recurse deeper.

Anyway. Do we want to add something like that directly in the N5 core?

If not, could we open the `N5JsonCache` API a bit more? (Currently I'm using reflection to populate `N5JsonCache.N5CacheInfo` entries without triggering unwanted backend reads.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persisting metadata cache #179

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Persisting metadata cache #179

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions