-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Desired functionality
hpo has a struct that caches similarity calculations from term-term calculations. This caching should be safe across threads to allow multiprocessing similarity.
Constraints
With an Ontology with ~13,000 terms, the total number of possible combinations is
n! / (k! * (n - k)!)
--> 13,000! / (2! * (13,000 -2)!)
==> 84,493,500
For each combination we must store a 32bit float similarity score + a hash for the two 32bit HpoTermIds. So we could end up with a huge cache and might have to find a way to limit the overall size. We could e.g. have one Hashset that contains all comparisons that result in 1 and another one for all that result in 0.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels