Skip to content

Commit ba94c21

Browse files
ptorrunvptorru
andauthored
Enabling support for enhanced customization of Nvidia ptxas options (#6993)
# Enabling enhanced `ptxas` customization This MR enables broader support for `ptxas` customization via the following functionality: * Ability to pass specific `ptxas` options. Available options are documented [here](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#ptxas-options) * Ability to pass these options for specific kernel calls Benefits: * Enables parameters to be passed to `ptxas`. * Enables targeted customization of the compilation behavior for each specific kernel call. Usage: Pass a string with `ptxas` options as the function parameter `ptx_options` in any given kernel call. Example: For tutorial `03-matrix-multiplication.py` one can enable `opt-level 3` for `leaky_relu` and `opt-level 0` for `matmul_kernel` like so: ```python ... if ACTIVATION == "leaky_relu": accumulator = leaky_relu(accumulator,ptx_options="--opt-level=0") ... matmul_kernel[grid]( a, b, c, # M, N, K, # a.stride(0), a.stride(1), # b.stride(0), b.stride(1), # c.stride(0), c.stride(1), # ACTIVATION=activation, # ptx_options="--opt-level=0" ) ``` Testing done: This was tested by modifying the following python tutorials: * `02-fused-softmax` * `03-matrix-multiplication` I checked the behavior of cached compiles, I can confirm the cache works as expected for different options on a given kernel. --------- Co-authored-by: Pedro Torruella <[email protected]>
1 parent 5c9e545 commit ba94c21

File tree

1 file changed

+12
-2
lines changed

1 file changed

+12
-2
lines changed

third_party/nvidia/backend/compiler.py

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@ class CUDAOptions:
106106
maxnreg: Optional[int] = None
107107
cluster_dims: tuple = (1, 1, 1)
108108
ptx_version: int = None
109+
ptx_options: str = None
109110
ir_override: Optional[str] = None # filename of a user-defined IR (*.{ttir|ttgir|llir|ptx})
110111
enable_fp_fusion: bool = True
111112
launch_cooperative_grid: bool = False
@@ -407,8 +408,17 @@ def make_cubin(self, src, metadata, opt, capability):
407408
line_info = ["-lineinfo", "-suppress-debug-info"] if knobs.compilation.disable_line_info else ["-lineinfo"]
408409
fmad = [] if opt.enable_fp_fusion else ['--fmad=false']
409410
arch = sm_arch_from_capability(capability)
410-
opt_level = ['--opt-level', '0'] if knobs.nvidia.disable_ptxas_opt else []
411-
ptxas_cmd = [ptxas, *line_info, *fmad, '-v', *opt_level, f'--gpu-name={arch}', fsrc.name, '-o', fbin]
411+
412+
# Disable ptxas optimizations if requested
413+
disable_opt = ['--opt-level', '0'] if knobs.nvidia.disable_ptxas_opt else []
414+
415+
# Accept more ptxas options if provided
416+
ptx_extra_options = opt.ptx_options.split(" ") if opt.ptx_options else []
417+
418+
ptxas_cmd = [
419+
ptxas, *line_info, *fmad, '-v', *disable_opt, *ptx_extra_options, f'--gpu-name={arch}', fsrc.name, '-o',
420+
fbin
421+
]
412422
try:
413423
subprocess.run(ptxas_cmd, check=True, close_fds=False, stderr=flog)
414424
if os.path.exists(fsrc.name):

0 commit comments

Comments
 (0)