Skip to content

Commit a21edb3

Browse files
yewentao256googlercolin
authored andcommitted
[Bug] Fix B200 DeepGEMM E8M0 Accuracy Issue (vllm-project#22399)
Signed-off-by: yewentao256 <[email protected]>
1 parent db3e239 commit a21edb3

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

vllm/model_executor/layers/quantization/utils/fp8_utils.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -799,7 +799,8 @@ def requant_weight_ue8m0_inplace(
799799
s_exp = s_exp[:m_cur, :k_cur]
800800
w_dq = w_q.to(torch.float32) * s_exp
801801
# Re-quantise using power-of-two scaling (UE8M0).
802-
w_requant, s_requant = per_block_cast_to_fp8(w_dq, [block_m, block_k])
802+
w_requant, s_requant = per_block_cast_to_fp8(w_dq, [block_m, block_k],
803+
use_ue8m0=True)
803804

804805
# Write back the results in-place.
805806
w_q.copy_(w_requant)

0 commit comments

Comments
 (0)