Skip to content

pack=True is slow #552

@hejamu

Description

@hejamu

We introduced the pack flag to make sure all compounds are located inside the box, so modules don't have to worry about weird trajectories. Is is especially important in combination with the refgroup flag, where we shift the whole system around.

At the time we assumed that this is important enough and the performance penalty is minimal, but this seems not to be the case.

I did a quick timing check and found the following:

%%timeit
for ts in u_EMD.trajectory[0:10]:
    u_EMD.atoms.unwrap(compound="residues")
>>> 746 ms ± 79.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
for ts in u_EMD.trajectory[0:10]:
    u_EMD.atoms.wrap()
>>> 11.8 ms ± 924 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
for ts in u_EMD.trajectory[0:10]:
    u_EMD.atoms.wrap(compound="residues")
>>> 668 ms ± 69.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

As one can see, while wrapping into the box is fast, packing (i.e. wrapping compounds as a whole) is on the same order as unwrapping.

I looked through the code of MDAnalysis.core.groups.wrap() but I can't pinpoint exactly why this is the case. In principle there is a lot less computational effort than unwrapping. I did a trivial check how long the core components of packing (accumulate and manipulating the positions array) should take:

%%timeit
sol = u_EMD.select_atoms("resname SOL")
for ts in u_EMD.trajectory[0:10]:
    com = sol.center_of_mass(compound="residues")
    com = np.mod(com, u_EMD.dimensions[:3])
    pos_new = sol.positions - np.repeat(com,3, axis=0)
    sol.positions = pos_new.copy()
>>> 15 ms ± 2.71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

This example is of course very wrong but wrapping could be this fast.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions