multi-gpu inference. Adds 'batch index' to the resulting prediction by skothenhill-nv · Pull Request #854 · NVIDIA/bionemo-framework

skothenhill-nv · 2025-04-30T20:46:42Z

Description

Adds 'batch index' to the resulting prediction dictionary. this allows users to reconstruct the original ordering of predictions with multi-gpu inference. Prevents users from using 'epoch' mode of inference with multiple GPUs. this addresses a known issue: https://nvbugswb.nvidia.com/NvBugs5/SWBug.aspx?bugid=4717442&cmtNo=

Type of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Refactor
Documentation update
Other (please describe):

Pre-submit Checklist

[x ] I have tested these changes locally
I have updated the documentation accordingly
All existing tests pass successfully

dictionary. this allows users to reconstruct the original ordering of predictions with multi-gpu inference. Signed-off-by: Steven <skothenhill@nvidia.com>

codecov-commenter · 2025-04-30T21:52:01Z

Codecov Report

Attention: Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.

Project coverage is 84.40%. Comparing base (29ce3bc) to head (924d426).

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
...ges/bionemo-llm/src/bionemo/llm/utils/callbacks.py	83.33%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #854      +/-   ##
==========================================
- Coverage   84.40%   84.40%   -0.01%     
==========================================
  Files         138      138              
  Lines        8685     8690       +5     
==========================================
+ Hits         7331     7335       +4     
- Misses       1354     1355       +1

Files with missing lines	Coverage Δ
...ges/bionemo-llm/src/bionemo/llm/utils/callbacks.py	`94.44% <83.33%> (-2.33%)`	⬇️

…e with batch_collator Signed-off-by: Steven <skothenhill@nvidia.com>

…854) ### Description Adds 'batch index' to the resulting prediction dictionary. this allows users to reconstruct the original ordering of predictions with multi-gpu inference. Prevents users from using 'epoch' mode of inference with multiple GPUs. this addresses a known issue: https://nvbugswb.nvidia.com/NvBugs5/SWBug.aspx?bugid=4717442&cmtNo= ### Type of changes - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### Pre-submit Checklist  - [x ] I have tested these changes locally - [x] I have updated the documentation accordingly - [x] All existing tests pass successfully --------- Signed-off-by: Steven <skothenhill@nvidia.com>

…854) ### Description Adds 'batch index' to the resulting prediction dictionary. this allows users to reconstruct the original ordering of predictions with multi-gpu inference. Prevents users from using 'epoch' mode of inference with multiple GPUs. this addresses a known issue: https://nvbugswb.nvidia.com/NvBugs5/SWBug.aspx?bugid=4717442&cmtNo= ### Type of changes - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### Pre-submit Checklist  - [x ] I have tested these changes locally - [x] I have updated the documentation accordingly - [x] All existing tests pass successfully --------- Signed-off-by: Steven <skothenhill@nvidia.com> Signed-off-by: Cory Ye <cye@nvidia.com>

…854) ### Description Adds 'batch index' to the resulting prediction dictionary. this allows users to reconstruct the original ordering of predictions with multi-gpu inference. Prevents users from using 'epoch' mode of inference with multiple GPUs. this addresses a known issue: https://nvbugswb.nvidia.com/NvBugs5/SWBug.aspx?bugid=4717442&cmtNo= ### Type of changes - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### Pre-submit Checklist  - [x ] I have tested these changes locally - [x] I have updated the documentation accordingly - [x] All existing tests pass successfully --------- Signed-off-by: Steven <skothenhill@nvidia.com>

…854) ### Description Adds 'batch index' to the resulting prediction dictionary. this allows users to reconstruct the original ordering of predictions with multi-gpu inference. Prevents users from using 'epoch' mode of inference with multiple GPUs. this addresses a known issue: https://nvbugswb.nvidia.com/NvBugs5/SWBug.aspx?bugid=4717442&cmtNo= ### Type of changes - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### Pre-submit Checklist  - [x ] I have tested these changes locally - [x] I have updated the documentation accordingly - [x] All existing tests pass successfully --------- Signed-off-by: Steven <skothenhill@nvidia.com> Signed-off-by: Ubuntu <camirr@nvidia.com>

multi-gpu inference. Adds 'batch index' to the resulting prediction

1267264

dictionary. this allows users to reconstruct the original ordering of predictions with multi-gpu inference. Signed-off-by: Steven <skothenhill@nvidia.com>

skothenhill-nv requested review from dorotat-nv, farhadrgh, jstjohn, malcolmgreaves and pstjohn as code owners April 30, 2025 20:46

polinabinder1 approved these changes Apr 30, 2025

View reviewed changes

farhadrgh approved these changes Apr 30, 2025

View reviewed changes

wraps the batch_idx value in a shape [1] torch tensor to be compatibl…

924d426

…e with batch_collator Signed-off-by: Steven <skothenhill@nvidia.com>

skothenhill-nv enabled auto-merge May 1, 2025 15:58

skothenhill-nv added this pull request to the merge queue May 1, 2025

Merged via the queue into main with commit 415f51f May 1, 2025
10 checks passed

skothenhill-nv deleted the hillst/inference-writer-guards-2 branch May 1, 2025 18:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-gpu inference. Adds 'batch index' to the resulting prediction#854

multi-gpu inference. Adds 'batch index' to the resulting prediction#854
skothenhill-nv merged 2 commits intomainfrom
hillst/inference-writer-guards-2

skothenhill-nv commented Apr 30, 2025

Uh oh!

codecov-commenter commented Apr 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

skothenhill-nv commented Apr 30, 2025

Description

Type of changes

Pre-submit Checklist

Uh oh!

codecov-commenter commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Apr 30, 2025 •

edited

Loading