Skip to content

[BUG] NeMo upgrade supporting hyena flash decode requires more memory for generation #1013

@jstjohn

Description

@jstjohn

BioNeMo Framework Version

103d76f

Bug Description

We used to be able to run on L4, now the peak memory usage is 32G for our test_evo2.py generation tasks, and 50G for the 7b model, which we used to be able to run. The big change is related to following the new API related to supporting flash decode at inference time.

Changes to skip tests that now fail: #1000
NeMo diff that relates to the issue: NVIDIA-NeMo/NeMo@164d12b...b97e42b (see changes to all files that mention hyena in the path).

Steps to Reproduce

  1. test_evo2.py worked with before Hyena Inference Updates to support Flash Decode #1000 and the nemo version bump to top of tree
  2. Bump nemo and to main and you see the memory increase leading to fails on test_evo2.py on L4

Error Messages and Logs

Docker Image

No response

System Information

Environment Details:

  • OS: [e.g., Ubuntu 20.04]
  • CPU: [e.g., Intel i9-12900K]
  • RAM: [e.g., 64GB]

GPU Details:

  • GPU Model: [e.g., NVIDIA RTX 4090]
  • GPU Memory: [e.g., 24GB]
  • CUDA Version: [e.g., 12.1]
  • CUDA Driver: [e.g., 525.85.05]
  • cuDNN Version: [e.g., 8.9.0]

Additional Context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions