Skip to content

Hyena Inference Updates to support Flash Decode#1000

Merged
jstjohn merged 21 commits intomainfrom
jstjohn/hyena-inference-update
Jul 31, 2025
Merged

Hyena Inference Updates to support Flash Decode#1000
jstjohn merged 21 commits intomainfrom
jstjohn/hyena-inference-update

Conversation

@jstjohn
Copy link
Collaborator

@jstjohn jstjohn commented Jul 23, 2025

Description

  • Support for Flash Decode at inference time in evo2. The changes had been made upstream in NeMo previously ( Hyena support for flash decode API NVIDIA-NeMo/NeMo#14315).
  • Do not yet support cudagraph, which is another potential performance gain at inference time.
  • Temporary work around for the issue of increased memory usage whenever passing an inference_context to hyena introduced by the flash decode API change: [BUG] NeMo upgrade supporting hyena flash decode requires more memory for generation #1013. This unblocks bumping nemo related PRs such as Bump NeMo #1012.
  • All of this is related to supporting a future NIM backed by bionemo. Unfortunately this is demonstrating how much code overhead there is in adding this kind of support, and with the memory usage issues, the speed improvement may not be worth it.
  • Training performance, as well as forward pass without inference context performance is not impacted.

jstjohn added 7 commits July 23, 2025 00:01
…pdates

Signed-off-by: John St John <jstjohn@nvidia.com>
Signed-off-by: John St John <jstjohn@nvidia.com>
Signed-off-by: John St John <jstjohn@nvidia.com>
Signed-off-by: John St John <jstjohn@nvidia.com>
Signed-off-by: John St John <jstjohn@nvidia.com>
Signed-off-by: John St John <jstjohn@nvidia.com>
Signed-off-by: John St John <jstjohn@nvidia.com>
@codecov-commenter
Copy link

codecov-commenter commented Jul 23, 2025

Codecov Report

❌ Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 83.34%. Comparing base (c03dc80) to head (103d76f).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...ackages/bionemo-evo2/src/bionemo/evo2/run/infer.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1000      +/-   ##
==========================================
+ Coverage   83.31%   83.34%   +0.02%     
==========================================
  Files         148      148              
  Lines        9758     9766       +8     
==========================================
+ Hits         8130     8139       +9     
+ Misses       1628     1627       -1     
Files with missing lines Coverage Δ
...ackages/bionemo-evo2/src/bionemo/evo2/run/infer.py 54.90% <90.90%> (+8.39%) ⬆️

... and 1 file with indirect coverage changes

jstjohn added 2 commits July 24, 2025 17:12
Signed-off-by: John St John <jstjohn@nvidia.com>
Signed-off-by: John St John <jstjohn@nvidia.com>
jstjohn and others added 6 commits July 24, 2025 17:16
Signed-off-by: John St John <jstjohn@nvidia.com>
Signed-off-by: John St John <jstjohn@nvidia.com>
Signed-off-by: John St John <jstjohn@nvidia.com>
Signed-off-by: John St John <jstjohn@nvidia.com>
Signed-off-by: John St John <jstjohn@nvidia.com>
@jstjohn
Copy link
Collaborator Author

jstjohn commented Jul 29, 2025

TODO wait for NVIDIA-NeMo/NeMo#14315 to merge and upstream to Main NeMo which also has the change from NVIDIA-NeMo/NeMo#14359

Update: This is now done.

Signed-off-by: John St John <jstjohn@nvidia.com>
@jstjohn jstjohn self-assigned this Jul 30, 2025
Signed-off-by: John St John <jstjohn@nvidia.com>
@jstjohn jstjohn added the enhancement New feature or request label Jul 30, 2025
@jstjohn jstjohn enabled auto-merge July 30, 2025 16:32
jstjohn added 3 commits July 30, 2025 20:36
… to use flash decode

Signed-off-by: John St John <jstjohn@nvidia.com>
Signed-off-by: John St John <jstjohn@nvidia.com>
Signed-off-by: John St John <jstjohn@nvidia.com>
Copy link
Collaborator

@skothenhill-nv skothenhill-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see anything glaring. What would be helpful is more in the PR description. I'm missing a lot of context, why did we make these changes? how should we think about this if someone encounters a non-related bug in the future from this PR?

Copy link
Collaborator

@farhadrgh farhadrgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jstjohn jstjohn added this pull request to the merge queue Jul 31, 2025
Merged via the queue into main with commit 962d141 Jul 31, 2025
18 checks passed
@jstjohn jstjohn deleted the jstjohn/hyena-inference-update branch July 31, 2025 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants