KD example fix for new torch/hf causing FDSP save error (NVIDIA#645)

AAnoosheh · web-flow · commit c1c5ca047ca8 · 2025-12-04T18:41:12.000Z
## What does this PR do? **Type of change:** ? Bug Fix **Overview:** ? `llm_distill` example was hanging during save since somehow now the weights on other ranks are being deleted during `model.export()` too early. Fixed via synchronizing the processes beforehand. ## Usage  ```python # Add a code snippet demonstrating how to use this ``` ## Testing  ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No  - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No  ## Additional Information  Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
diff --git a/examples/llm_distill/requirements.txt b/examples/llm_distill/requirements.txt
@@ -1,3 +1,4 @@
 pyarrow
+torchao>=0.14.1
 transformers<5.0
 trl>=0.23.0
diff --git a/modelopt/torch/distill/plugins/huggingface.py b/modelopt/torch/distill/plugins/huggingface.py
@@ -64,13 +64,13 @@ def save_model(
             output_dir = self.args.output_dir
         model = self.accelerator.unwrap_model(self.model)
         if not _internal_call and self.is_fsdp_enabled:
-            state_dict = self.accelerator.get_state_dict(self.model)
+            with model.hide_teacher_model(enable=export_student):
+                state_dict = self.accelerator.get_state_dict(self.model)
             modelopt_state = mto.modelopt_state(model)
             if export_student:
+                # Need to wait, otherwise FSDP weights may be deleted before rank 0 can gather them
+                self.accelerator.wait_for_everyone()
                 model = model.export()
-                # remove teacher model from state dict since FSDP forces
-                # expose_minimal_state_dict to be False
-                state_dict = {k: v for k, v in state_dict.items() if "_teacher_model" not in k}
 
             if self.accelerator.is_main_process:
                 model.save_pretrained(