[TRTLLM-8734][feat] AutoDeploy: Enable the nvfp4 for Nemotron MOE (#8737)

nvchenghaoz · suyoggupta · web-flow · commit 71c5576a44ce · 2025-10-30T12:33:08.000-07:00
Signed-off-by: nvchenghaoz &lt;211069071+nvchenghaoz@users.noreply.github.com&gt;
Co-authored-by: Suyog Gupta &lt;41447211+suyoggupta@users.noreply.github.com&gt;
diff --git a/tensorrt_llm/_torch/auto_deploy/transform/library/quantization.py b/tensorrt_llm/_torch/auto_deploy/transform/library/quantization.py
@@ -375,12 +375,10 @@ def load_hook(self, state_dict, prefix, *args, weight_name):
                     )
                     state_dict[input_scale_name] = 1 / state_dict[input_scale_name]
                     weight_scale = state_dict[weight_name + "_scale"].view(float4_sf_dtype)
-                    ori_shape = weight_scale.shape
                     state_dict[weight_name + "_scale"] = (
                         torch.ops.trtllm.block_scale_interleave(
                             weight_scale.view(torch.uint8).cpu().contiguous()
                         )
-                        .reshape(ori_shape)
                         .view(float4_sf_dtype)
                         .reshape(-1)
                     )

Original file line number	Diff line number	Diff line change
`@@ -375,12 +375,10 @@ def load_hook(self, state_dict, prefix, *args, weight_name):`
`375`	`375`	`)`
`376`	`376`	`state_dict[input_scale_name] = 1 / state_dict[input_scale_name]`
`377`	`377`	`weight_scale = state_dict[weight_name + "_scale"].view(float4_sf_dtype)`
`378`		`- ori_shape = weight_scale.shape`
`379`	`378`	`state_dict[weight_name + "_scale"] = (`
`380`	`379`	`torch.ops.trtllm.block_scale_interleave(`
`381`	`380`	`weight_scale.view(torch.uint8).cpu().contiguous()`
`382`	`381`	`)`
`383`		`- .reshape(ori_shape)`
`384`	`382`	`.view(float4_sf_dtype)`
`385`	`383`	`.reshape(-1)`
`386`	`384`	`)`