-
Notifications
You must be signed in to change notification settings - Fork 123
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
BioNeMo Framework Version
Bug Description
We used to be able to run on L4, now the peak memory usage is 32G for our test_evo2.py generation tasks, and 50G for the 7b model, which we used to be able to run. The big change is related to following the new API related to supporting flash decode at inference time.
Changes to skip tests that now fail: #1000
NeMo diff that relates to the issue: NVIDIA-NeMo/NeMo@164d12b...b97e42b (see changes to all files that mention hyena in the path).
Steps to Reproduce
test_evo2.pyworked with before Hyena Inference Updates to support Flash Decode #1000 and the nemo version bump to top of tree- Bump nemo and to main and you see the memory increase leading to fails on
test_evo2.pyon L4
Error Messages and Logs
Docker Image
No response
System Information
Environment Details:
- OS: [e.g., Ubuntu 20.04]
- CPU: [e.g., Intel i9-12900K]
- RAM: [e.g., 64GB]
GPU Details:
- GPU Model: [e.g., NVIDIA RTX 4090]
- GPU Memory: [e.g., 24GB]
- CUDA Version: [e.g., 12.1]
- CUDA Driver: [e.g., 525.85.05]
- cuDNN Version: [e.g., 8.9.0]
Additional Context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working