You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enabling support for enhanced customization of Nvidia ptxas options (#6993)
# Enabling enhanced `ptxas` customization
This MR enables broader support for `ptxas` customization via the
following functionality:
* Ability to pass specific `ptxas` options. Available options are
documented
[here](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#ptxas-options)
* Ability to pass these options for specific kernel calls
Benefits:
* Enables parameters to be passed to `ptxas`.
* Enables targeted customization of the compilation behavior for each
specific kernel call.
Usage:
Pass a string with `ptxas` options as the function parameter
`ptx_options` in any given kernel call.
Example:
For tutorial `03-matrix-multiplication.py` one can enable `opt-level 3`
for `leaky_relu` and `opt-level 0` for `matmul_kernel` like so:
```python
...
if ACTIVATION == "leaky_relu":
accumulator = leaky_relu(accumulator,ptx_options="--opt-level=0")
...
matmul_kernel[grid](
a, b, c, #
M, N, K, #
a.stride(0), a.stride(1), #
b.stride(0), b.stride(1), #
c.stride(0), c.stride(1), #
ACTIVATION=activation, #
ptx_options="--opt-level=0"
)
```
Testing done:
This was tested by modifying the following python tutorials:
* `02-fused-softmax`
* `03-matrix-multiplication`
I checked the behavior of cached compiles, I can confirm the cache works
as expected for different options on a given kernel.
---------
Co-authored-by: Pedro Torruella <[email protected]>
0 commit comments