Skip to content

Commit 1bb480f

Browse files
Peiying Huameta-codesync[bot]
authored andcommitted
Allow specifiying the use of persistent kernel (#5129)
Summary: X-link: meta-pytorch/tritonbench#654 Pull Request resolved: #5129 X-link: https://github.com/facebookresearch/FBGEMM/pull/2130 Added environment argument "use_persistent" (default is False) to explicitly turn off non-persistent kernel and use persistent kernel. Throws error when both "use_persistent" and "no_use_persistent" are specified in the arguments. Example usage: Persistent kernel-- buck2 run mode/{dev-nosan,amd-gpu} -c xlog.level=WARNING -m ovr_config//triton:trunk -m rocm7 -c fbcode.nvcc_arch=mi350 -c fbcode.enable_gpu_sections=true pytorch/tritonbench:run -- --op fp8_gemm_rowwise --no_use_tma --use_persistent Non-persistent kernel-- buck2 run mode/{dev-nosan,amd-gpu} -c xlog.level=WARNING -m ovr_config//triton:trunk -m rocm7 -c fbcode.nvcc_arch=mi350 -c fbcode.enable_gpu_sections=true pytorch/tritonbench:run -- --op fp8_gemm_rowwise --no_use_tma --no_use_persistent When both specified in the arguments: buck2 run mode/{dev-nosan,amd-gpu} -c xlog.level=WARNING -m ovr_config//triton:trunk -m rocm7 -c fbcode.nvcc_arch=mi350 -c fbcode.enable_gpu_sections=true pytorch/tritonbench:run -- --op fp8_gemm_rowwise --no_use_tma --use_persistent --no_use_persistent IT WILL THROW ERROR: Cannot specify both '--use_persistent' and '--no_use_persistent' at the same time. These options are mutually exclusive. Please use only one. Reviewed By: adamomainz, njriasan, jwfromm Differential Revision: D86579911 fbshipit-source-id: cb79900a1b641ae86b3935b2ed1523a3f186ac4e
1 parent f9c6156 commit 1bb480f

File tree

1 file changed

+7
-1
lines changed

1 file changed

+7
-1
lines changed

fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1212,6 +1212,8 @@ def matmul_fp8_row(
12121212
imprecise_acc: bool = False,
12131213
tma_persistent: bool = True,
12141214
no_use_persistent: Optional[bool] = None,
1215+
# add an option to explicitly require the use of persistent process
1216+
use_persistent: Optional[bool] = None,
12151217
use_warp_specialization: bool = False,
12161218
) -> torch.Tensor:
12171219
"""
@@ -1232,12 +1234,16 @@ def matmul_fp8_row(
12321234
Returns:
12331235
torch.Tensor: [M, N] Output tensor a @ b / (a_scale[:, None] * b_scale[None, :])
12341236
"""
1235-
if no_use_persistent is None:
1237+
if use_persistent:
1238+
no_use_persistent = False
1239+
elif no_use_persistent is None:
12361240
# Default True for AMD and False for Nvidia.
12371241
if torch.version.hip is not None:
12381242
no_use_persistent = True
12391243
else:
12401244
no_use_persistent = False
1245+
# if use_persistent is explicitly requested, set o_use_persistent to False
1246+
12411247
# Get datatypes and constants to use.
12421248
pt_fp8_dtype, _, _, _ = get_fp8_constants()
12431249
# Handle 3D+ a shape

0 commit comments

Comments
 (0)