Skip to content

Commit 382d04a

Browse files
jananisrirampytorchmergebot
authored andcommitted
[Inductor][ATen][FP8] Add note for supported blockwise scaling strategy pairs (pytorch#165450)
Summary: Add note mentioning which scaling type pairs are supported in Inductor ATen, since this was a source of confusion and also informs which scaling strategies we choose to support for other backends, like Triton. Test Plan: n/a Reviewed By: lw Differential Revision: D84522373 Pull Request resolved: pytorch#165450 Approved by: https://github.com/NikhilAPatel
1 parent 1ec0755 commit 382d04a

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

aten/src/ATen/native/cuda/Blas.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1273,6 +1273,10 @@ _scaled_mm_out_cuda(const Tensor& mat1, const Tensor& mat2,
12731273
// by decreasing priority. We prefer "simpler" schemes as they are supported
12741274
// more broadly (more GPU archs, more CUDA versions) and because they are more
12751275
// efficient. This tends to matter only for small matmuls (e.g., 1x1x128).
1276+
1277+
// List of supported BlockWise pairs for FP8:
1278+
// https://docs.nvidia.com/cuda/cublas/#element-1d-and-128x128-2d-block-scaling-for-fp8-data-types
1279+
12761280
auto [scaling_choice_a, scaling_choice_b] = get_joint_scaling(
12771281
{
12781282
std::make_pair(ScalingType::TensorWise, ScalingType::TensorWise),

0 commit comments

Comments
 (0)