-
Notifications
You must be signed in to change notification settings - Fork 77
Description
For the purposes of assessing how the two-locus framework performs relative to the LD Calculator, I've performed a series of benchmarks. In our initial benchmarks, the two-locus framework was rather slow relative to the LD Calculator (up to 10x slower). We went ahead and improved some of the low-hanging inefficiencies in the code and now I'm happy to report that the two-locus framework is within 2x the speed of the LD Calculator, and in some cases faster. I have created a document that attempts to thoroughly detail the changes made and the reasoning behind them: benchmarks.pdf. For brevity, I've listed some of the highlights below for a high-level overview.
Here is a brief description of the changes that we made to increase performance:
- Base case: The original code that is currently in tskit (link).
- Malloc out of hot loop: Preallocate a structure of arrays used for
temporary calculations instead of allocating and freeing for every pair of
sites (link). - Refactor Bit Arrays: Refactor bit array interface, removing the need for
temporary arrays in many cases. All functions now take a row index as a
parameter (link). - Precompute Allele Counts and Biallelic Summary Function: Store precomputed
bit arrays for each sample set and allele and the count of samples for each
allele. Introduce a biallelic summary function that avoids multiple redundant
computations, leaving the original normalized summary function for
multiallelic sites (link).
Python and C tests are all passing for each of these patches.
We benchmark each change (in our benchmarks, each change is layered on the next) on a set of tree sequences with the following parameters:
parameter | value |
---|---|
r | 1e-8 |
1e4 | |
L | 2e6 |
With ploidy=1
and sample sizes ranging from

In these plots the Relative difference
is
Next, we generated a larger tree sequence with L=6e6
and sample sizes ranging from

I'm not sure that there's any desire to deprecate/remove the LD Calculator at this point, but at least this provides some real-world numbers on the relative performance between these two methods. I'd like to incorporate these patches into the codebase if at all possible.