-
Notifications
You must be signed in to change notification settings - Fork 70
Labels
questionFurther information is requestedFurther information is requested
Description
The smallest model (simplefold_100M) is unable to generate even 5 samples at a time on a NVIDIA A100-SXM4-40GB.
simplefold --simplefold_model simplefold_100M \
--num_steps 500 --tau 0.01 --nsample_per_protein 5 \
--plddt --fasta_path test.fasta --output_dir testout \
--backend torch
[OOM ERROR]
File "ml-simplefold/src/simplefold/model/torch/layers.py", line 142, in forward
return self.w2(F.silu(self.w1(x)) * self.w3(x))
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 36.00 MiB. GPU 0 has a total capacity of 39.39 GiB of which 30.31 MiB is free. Including non-PyTorch memory, this process has 39.36 GiB memory in use. Of the allocated memory 36.93 GiB is allocated by PyTorch, and 1.94 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
test.fasta has only 1 protein, and it works fine with nsample_per_protein=1
Its interesting that mlx is able to handle a higher nsample_per_protein on an m2pro machine (16GB RAM). I can confirm that no other processes are running on the GPU.
Am I missing something obvious? Kindly share any suggestions.
mbautistamartin
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested