Skip to content

Commit 4e77e95

Browse files
jiawenliu64facebook-github-bot
authored andcommitted
Make FP8 BMM output contiguous (pytorch#370)
Summary: X-link: pytorch#3270 Pull Request resolved: facebookresearch/FBGEMM#370 Make fp8 bmm output contiguous as [silu_mul](https://fburl.com/code/sa1faq0w) requests output tensor of fp8 bmm stride(-1) to be 1. This Diff fixes the issue Reviewed By: jspark1105 Differential Revision: D64811808 fbshipit-source-id: e0f213f24fbf8bf989576371af1e2ada4cafbfb1
1 parent 0259700 commit 4e77e95

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched.cu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -448,7 +448,7 @@ at::Tensor handle_transposition(
448448
BIAS_DTYPE>(
449449
WQ.transpose(1, 2), XQ.transpose(1, 2), w_scale, x_scale, bias, out);
450450
}
451-
return out_.transpose(1, 2);
451+
return out_.transpose(1, 2).contiguous();
452452
}
453453
}
454454

0 commit comments

Comments
 (0)