Skip to content

Conversation

@carrascomj
Copy link

@carrascomj carrascomj commented Nov 6, 2025

Closes #24

Description

Running inference with --nsamples_per_protein greater than 1 blows up memory using the torch backend. This happens because inference_multiplicity was used to replicate every tensor (including ESM embeddings) nsample_per_protein times before moving them to the GPU, so multi-sample inference pushed huge duplicated feature tensors into vRAM and regularly OOM’d.

After this fix, taking 5 samples of a protein sequence of ~1000 amino acids stays in around 31GiB of vRAM using the torch backend (instead of running OOM for a 81GiB GPU.)

Implementation

I fixed it by removing inference_multiplicity altogether to simply run inference in a nsample_per_protein loop.

This does slow down inference since it's not batched, so feel free to close the PR, just here for future reference.

Remove inference_multiplicity altogether
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OOM error on NVIDIA A100

1 participant