Skip to content

Memory spikes#374

Open
jvdoorss wants to merge 3 commits intojwohlwend:mainfrom
PUXANO:memory_spikes
Open

Memory spikes#374
jvdoorss wants to merge 3 commits intojwohlwend:mainfrom
PUXANO:memory_spikes

Conversation

@jvdoorss
Copy link

A number of changes intended to avoid short-lived memory allocations for intermediate variables.

More complicated one-liners can imply that several like-sized Tensors are stored until the computation is completed, even though a more sequential approach could reduce that number. For multi-GB tensors in the trunk-module, containing $\mathcal{O}(N_{residue}^2)$ elements, this can prove a crucial optimization.

Predicting the structure of 9b9j on a 40GB A100 gpu failed without these changes, but succeeds with them.

Typical changes:

  • alter z = z + a to z += a (or z[:] = z + a) to write directly to already allocated memory
  • reuse allocated memory in the outer scope by not initializing a new variable if the shape is constant:
    def inner_scope(z):
      z[:] = shape_preserving_op(z)
      ...
    
    instead of
    def inner_scope(z):
      z = shape_preserving_op(z) #now there's 2 z's
      ...
    
  • delete redundant tensors
  • split up one-liners

Clearly this approach has some drawbacks, reducing speed (probably) and versatility of methods (that now modify the outer scope), but might be an acceptable trade-off.

NOTE While I think changes like this will in general optimize memory movements, as e.g. argued here and as seen from the actual memory consumption improvement of Boltz, it could be that not all proposed changes have an actual effect.

jvdoorss and others added 3 commits June 13, 2025 14:54
* try to have only 1 instance of the trunk pairwise embeddings at any time
* avoid summing more than 2 terms at a time
* update Tensors in-place by using '[:]' or '+='
* remove large Tensors asap
* avoid duplicating `n_res ** 3`-sized b-Tensor
* update in-place
The transition module handles large tensors. Computing multiple operation simultaneously in a one-liner with such tensors requires a lot of memory. A more sequential process reduces that load.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant