Skip to content

Add thread-safe caching to similarity calculations #23

@anergictcell

Description

@anergictcell

Desired functionality
hpo has a struct that caches similarity calculations from term-term calculations. This caching should be safe across threads to allow multiprocessing similarity.

Constraints
With an Ontology with ~13,000 terms, the total number of possible combinations is
n! / (k! * (n - k)!)
--> 13,000! / (2! * (13,000 -2)!)
==> 84,493,500

For each combination we must store a 32bit float similarity score + a hash for the two 32bit HpoTermIds. So we could end up with a huge cache and might have to find a way to limit the overall size. We could e.g. have one Hashset that contains all comparisons that result in 1 and another one for all that result in 0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions