fix: Fix auto quantize multi-gpu tests (NVIDIA#629)

Fridah-nv · web-flow · commit 255eb1af2a10 · 2025-12-02T13:11:06.000-08:00
## What does this PR do? **Type of change:** ? Bug fix **Overview:** ? move tensor to cpu before gathering ## Usage  ```python # Add a code snippet demonstrating how to use this ``` ## Testing ``` pytest tests/gpu/torch/quantization/plugins/test_megatron.py::test_auto_quantize ``` ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No  - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No  ## Additional Information  Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
diff --git a/modelopt/torch/quantization/algorithms.py b/modelopt/torch/quantization/algorithms.py
@@ -273,12 +273,13 @@ def get_score(self, recipe: QuantRecipe) -> float:
             if parallel_state.expert_model_parallel_group.is_initialized():
                 # TODO: Support expert model parallelism for score estimation
                 warnings.warn("AutoQuantize does not support expert model parallelism yet.")
+            importance = importance.cpu()
             importance = DistributedProcessGroup.get_dist_syncd_obj(
                 importance,
                 [parallel_state.tensor_parallel_group, parallel_state.data_parallel_group],
                 sum,
             )
-            total_score += importance.cpu().item()
+            total_score += importance.item()
         return total_score
 
     def get_cost(self, recipe: QuantRecipe) -> float: