Skip to content

Benchmarking the Two-Locus Framework and LD Calculator #3290

@lkirk

Description

@lkirk

For the purposes of assessing how the two-locus framework performs relative to the LD Calculator, I've performed a series of benchmarks. In our initial benchmarks, the two-locus framework was rather slow relative to the LD Calculator (up to 10x slower). We went ahead and improved some of the low-hanging inefficiencies in the code and now I'm happy to report that the two-locus framework is within 2x the speed of the LD Calculator, and in some cases faster. I have created a document that attempts to thoroughly detail the changes made and the reasoning behind them: benchmarks.pdf. For brevity, I've listed some of the highlights below for a high-level overview.

Here is a brief description of the changes that we made to increase performance:

  1. Base case: The original code that is currently in tskit (link).
  2. Malloc out of hot loop: Preallocate a structure of arrays used for
    temporary calculations instead of allocating and freeing for every pair of
    sites (link).
  3. Refactor Bit Arrays: Refactor bit array interface, removing the need for
    temporary arrays in many cases. All functions now take a row index as a
    parameter (link).
  4. Precompute Allele Counts and Biallelic Summary Function: Store precomputed
    bit arrays for each sample set and allele and the count of samples for each
    allele. Introduce a biallelic summary function that avoids multiple redundant
    computations, leaving the original normalized summary function for
    multiallelic sites (link).

Python and C tests are all passing for each of these patches.

We benchmark each change (in our benchmarks, each change is layered on the next) on a set of tree sequences with the following parameters:

parameter value
r 1e-8
$N_e$ 1e4
L 2e6

With ploidy=1 and sample sizes ranging from $10^2$ - $10^5$. We sample 15 replicates to obtain these results:

Image

In these plots the Relative difference is $(\textrm{tl} - \textrm{ldc}) / \textrm{ldc}$ where ldc is the LD Calculator and tl is the two-locus framework.

Next, we generated a larger tree sequence with L=6e6 and sample sizes ranging from $10^3$ - $10^6$. Comparing all optimizations on the smaller (panel A) and larger (panel B), we get the following results:

Image

I'm not sure that there's any desire to deprecate/remove the LD Calculator at this point, but at least this provides some real-world numbers on the relative performance between these two methods. I'd like to incorporate these patches into the codebase if at all possible.

cc @jeromekelleher @petrelharp @apragsdale

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions