Skip to content

10-fold manysearch performance slowdown between v0.9.5 and v0.9.6Β #463

@ctb

Description

@ctb

When benchmarking #430, I noticed a massive slowdown in manysearch. I took a few days to track it down using our conda-forge releases and https://github.com/dib-lab/2022-branchwater-benchmarking.

See results in a_vs_d.txt under benchmarks/ for the various branches of the benchmark repo - summarized below:

plugin version time memory
v0.8.6 814s 18.3 GB
v0.9.0 1332s 27.1 GB
v0.9.1 1361s 22.4 GB
v0.9.5 1534s 21.1 GB
v0.9.6 17523s 27.2 GB
unreleased w/#430 & #464 1428.0 18.6 GB
unreleased w/#464 1392.6 20.1 GB
unreleased w/#464 and --ignore-abundance 1388.6 20.5 GB

I think the max memory is variable because it depends on loading order and duration of the time the sketches are held in memory.

The bigger concern, of course, is the 10x increase in time that happened with release v0.9.6.

The culprit is #302, whichs add abundance-weighted information to manysearch and introduced a performance regression around sketch downsampling. #464 fixes the regression and also makes abundance calculations user-toggleable.

On a separate note, there has been a regular increase in execution time over the releases - from v0.8.6 to v0.9.0, and again between v0.9.1 and v0.9.5.

I think the first of these two slowdowns might have to do with the switch to use core manifest loading, and in particular the need to read .sig files when using pathlists - once to build a manifest, and then again when actually loading the sketch for search. This might be something that we can fix when dealing with #445.

Not sure what changed between v0.9.1 and v0.9.5, though. Maybe I should run some more benchmarks to nail down the exact release where the time increase occurred. Or maybe we should just profile the heck out of the plugin. Or both.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions