Open
Conversation
* try to have only 1 instance of the trunk pairwise embeddings at any time * avoid summing more than 2 terms at a time * update Tensors in-place by using '[:]' or '+='
* remove large Tensors asap * avoid duplicating `n_res ** 3`-sized b-Tensor * update in-place
The transition module handles large tensors. Computing multiple operation simultaneously in a one-liner with such tensors requires a lot of memory. A more sequential process reduces that load.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A number of changes intended to avoid short-lived memory allocations for intermediate variables.
More complicated one-liners can imply that several like-sized Tensors are stored until the computation is completed, even though a more sequential approach could reduce that number. For multi-GB tensors in the trunk-module, containing$\mathcal{O}(N_{residue}^2)$ elements, this can prove a crucial optimization.
Predicting the structure of 9b9j on a 40GB A100 gpu failed without these changes, but succeeds with them.
Typical changes:
z = z + atoz += a(orz[:] = z + a) to write directly to already allocated memoryClearly this approach has some drawbacks, reducing speed (probably) and versatility of methods (that now modify the outer scope), but might be an acceptable trade-off.
NOTE While I think changes like this will in general optimize memory movements, as e.g. argued here and as seen from the actual memory consumption improvement of Boltz, it could be that not all proposed changes have an actual effect.