Skip to content

multi-gpu inference. Adds 'batch index' to the resulting prediction#854

Merged
skothenhill-nv merged 2 commits intomainfrom
hillst/inference-writer-guards-2
May 1, 2025
Merged

multi-gpu inference. Adds 'batch index' to the resulting prediction#854
skothenhill-nv merged 2 commits intomainfrom
hillst/inference-writer-guards-2

Conversation

@skothenhill-nv
Copy link
Collaborator

Description

Adds 'batch index' to the resulting prediction dictionary. this allows users to reconstruct the original ordering of predictions with multi-gpu inference. Prevents users from using 'epoch' mode of inference with multiple GPUs. this addresses a known issue: https://nvbugswb.nvidia.com/NvBugs5/SWBug.aspx?bugid=4717442&cmtNo=

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor
  • Documentation update
  • Other (please describe):

Pre-submit Checklist

  • [x ] I have tested these changes locally
  • I have updated the documentation accordingly
  • All existing tests pass successfully

dictionary. this allows users to reconstruct the original ordering of
predictions with multi-gpu inference.

Signed-off-by: Steven <skothenhill@nvidia.com>
@codecov-commenter
Copy link

codecov-commenter commented Apr 30, 2025

Codecov Report

Attention: Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.

Project coverage is 84.40%. Comparing base (29ce3bc) to head (924d426).

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...ges/bionemo-llm/src/bionemo/llm/utils/callbacks.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #854      +/-   ##
==========================================
- Coverage   84.40%   84.40%   -0.01%     
==========================================
  Files         138      138              
  Lines        8685     8690       +5     
==========================================
+ Hits         7331     7335       +4     
- Misses       1354     1355       +1     
Files with missing lines Coverage Δ
...ges/bionemo-llm/src/bionemo/llm/utils/callbacks.py 94.44% <83.33%> (-2.33%) ⬇️

…e with batch_collator

Signed-off-by: Steven <skothenhill@nvidia.com>
@skothenhill-nv skothenhill-nv enabled auto-merge May 1, 2025 15:58
@skothenhill-nv skothenhill-nv added this pull request to the merge queue May 1, 2025
Merged via the queue into main with commit 415f51f May 1, 2025
10 checks passed
@skothenhill-nv skothenhill-nv deleted the hillst/inference-writer-guards-2 branch May 1, 2025 18:27
cspades pushed a commit that referenced this pull request May 4, 2025
…854)

### Description
Adds 'batch index' to the resulting prediction dictionary. this allows
users to reconstruct the original ordering of predictions with multi-gpu
inference. Prevents users from using 'epoch' mode of inference with
multiple GPUs. this addresses a known issue:
https://nvbugswb.nvidia.com/NvBugs5/SWBug.aspx?bugid=4717442&cmtNo=

### Type of changes
- [x]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):



### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [x ] I have tested these changes locally
 - [x] I have updated the documentation accordingly
 - [x] All existing tests pass successfully

---------

Signed-off-by: Steven <skothenhill@nvidia.com>
cspades pushed a commit that referenced this pull request May 4, 2025
…854)

### Description
Adds 'batch index' to the resulting prediction dictionary. this allows
users to reconstruct the original ordering of predictions with multi-gpu
inference. Prevents users from using 'epoch' mode of inference with
multiple GPUs. this addresses a known issue:
https://nvbugswb.nvidia.com/NvBugs5/SWBug.aspx?bugid=4717442&cmtNo=

### Type of changes
- [x]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [x ] I have tested these changes locally
 - [x] I have updated the documentation accordingly
 - [x] All existing tests pass successfully

---------

Signed-off-by: Steven <skothenhill@nvidia.com>
Signed-off-by: Cory Ye <cye@nvidia.com>
trvachov pushed a commit that referenced this pull request May 16, 2025
…854)

### Description
Adds 'batch index' to the resulting prediction dictionary. this allows
users to reconstruct the original ordering of predictions with multi-gpu
inference. Prevents users from using 'epoch' mode of inference with
multiple GPUs. this addresses a known issue:
https://nvbugswb.nvidia.com/NvBugs5/SWBug.aspx?bugid=4717442&cmtNo=

### Type of changes
- [x]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):



### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [x ] I have tested these changes locally
 - [x] I have updated the documentation accordingly
 - [x] All existing tests pass successfully

---------

Signed-off-by: Steven <skothenhill@nvidia.com>
camirr-nv pushed a commit that referenced this pull request Jun 26, 2025
…854)

### Description
Adds 'batch index' to the resulting prediction dictionary. this allows
users to reconstruct the original ordering of predictions with multi-gpu
inference. Prevents users from using 'epoch' mode of inference with
multiple GPUs. this addresses a known issue:
https://nvbugswb.nvidia.com/NvBugs5/SWBug.aspx?bugid=4717442&cmtNo=

### Type of changes
- [x]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [x ] I have tested these changes locally
 - [x] I have updated the documentation accordingly
 - [x] All existing tests pass successfully

---------

Signed-off-by: Steven <skothenhill@nvidia.com>
Signed-off-by: Ubuntu <camirr@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants