Skip to content

A general system to store particle metadata #64

@koadman

Description

@koadman

Metadata about particles is useful and currently includes things such as a cached log likelihood, but could also include a cached ML distance estimate or other information. Currently metadata is stored outside the particle, e.g. in OnlineCalculator and requires the class containing the metadata to be notified of particle deletions so stale cache data can be cleared. Currently this is done by storing a reference to the class that needs to be notified inside the particle, and this approach does not scale to an arbitrary number of metadata values and container classes.

Approach 1:

Store metadata inside the particle itself. An interface to fetch particular bits of metadata needs to be devised. This could be as simple as an unordered_map going from some key type (string? a compiler-mangled class name?) to a base class shared_ptr. The class creating metadata would maintain a set of weak_ptr's to the metadata objects and before accessing a particular piece of metadata would check for metadata validity using weak_ptr::expired(). This allows the metadata to be deleted when particles are deleted without needing to actively notify the class creating metadata.
One disadvantage of this approach is that creating per-particle caches will incur considerable memory overhead.

Approach 2:

Create a global metadata store via a static instance. This would be an unordered_multimap from particle pointer to metadata. e.g. unordered_multimap< particle*, pair< string, metadata_base* > > This solves the stale cache problem by allowing a single global metadata store to be notified at time of particle/node/etc deletion. This approach may have more problems with multithreading and concurrency than the first approach.

Approach 3:

Maintain metadata inside the generating class as it is currently done, associating the metadata with a weak_ptr to the object on which the metadata is stored. Before accessing the metadata, check for expiration of the weak_ptr. This approach is currently slightly frustrating because hash functions are not defined for std::weak_ptr, apparently for no reason other than lack of time from the c++ steering committee:
http://stackoverflow.com/questions/4750504/why-was-stdhash-not-defined-for-stdweak-ptr-in-c0x
so the hash key could be the memory address

This approach has the advantage of providing a lot of flexibility in designing metadata storage, but the disadvantage that caches might grow quite large because they are not actively trimmed back to surviving particles. Size could be managed with one of the classic MFU-approximation cache strategies, and a generic implementation could be made for arbitrary data/metadata.


The immediately motivating use case is that when node merges are proposed non-uniformly, e.g. using a distance guided approach, the same pair gets proposed many times at each generation. Calculating the ML distance among the pair is rather expensive operation using e.g. 10 iterations but results in much higher particle LL. A cache would allow the ML dist to be calculated once for the node pair and saved rather than recomputed dozens or hundreds of times.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions