Commit c1c5ca0
authored
KD example fix for new torch/hf causing FDSP save error (NVIDIA#645)
## What does this PR do?
**Type of change:** ? Bug Fix
**Overview:** ? `llm_distill` example was hanging during save since
somehow now the weights on other ranks are being deleted during
`model.export()` too early. Fixed via synchronizing the processes
beforehand.
## Usage
<!-- You can potentially add a usage example below. -->
```python
# Add a code snippet demonstrating how to use this
```
## Testing
<!-- Mention how have you tested your change if applicable. -->
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes/No <!--- If No, explain
why. -->
- **Did you write any new necessary tests?**: Yes/No
- **Did you add or update any necessary documentation?**: Yes/No
- **Did you update
[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes/No <!--- Only for new features, API changes, critical bug fixes or
bw breaking changes. -->
## Additional Information
<!-- E.g. related issue. -->
Signed-off-by: Asha Anoosheh <[email protected]>1 parent 9409412 commit c1c5ca0
File tree
2 files changed
+5
-4
lines changed- examples/llm_distill
- modelopt/torch/distill/plugins
2 files changed
+5
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
67 | | - | |
| 67 | + | |
| 68 | + | |
68 | 69 | | |
69 | 70 | | |
| 71 | + | |
| 72 | + | |
70 | 73 | | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
| |||
0 commit comments