File tree Expand file tree Collapse file tree 2 files changed +8
-8
lines changed Expand file tree Collapse file tree 2 files changed +8
-8
lines changed Original file line number Diff line number Diff line change 11# FLA Environment Variables
22
3- | Variable | Default | Options | Description |
4- | --- | --- | --- | --- |
5- | ` FLA_NO_USE_TMA ` | ` 0 ` | ` 0 ` or ` 1 ` | Set to ` 1 ` to disable Tensor Memory Accelerator (TMA) on Hopper or Blackwell GPUs. |
6- | ` FLA_CONV_BACKEND ` | ` cuda ` | ` triton ` or ` cuda ` | Choose the convolution backend. ` cuda ` is the default and preferred for most cases. |
7- | ` FLA_USE_FAST_OPS ` | ` 0 ` | ` 0 ` or ` 1 ` | Enable faster, but potentially less accurate, operations when set to ` 1 ` . |
8- | ` FLA_CACHE_RESULTS ` | ` 1 ` | ` 0 ` or ` 1 ` | Whether to cache autotune timings to disk. Defaults to ` 1 ` (enabled). |
9- | ` FLA_TRIL_PRECISION ` | ` ieee ` | ` ieee ` , ` tf32 ` , ` tf32x3 ` | Controls the precision for triangular operations. ` tf32x3 ` is only available on NV GPUs. |
3+ | Variable | Default | Options | Description |
4+ | -------------------- | ------- | ------------------------ | ------------------------------------------------------------------------------------- --- |
5+ | ` FLA_CONV_BACKEND ` | ` cuda ` | ` triton ` or ` cuda ` | Choose the convolution backend. ` cuda ` is the default and preferred for most cases. |
6+ | ` FLA_USE_TMA ` | ` 0 ` | ` 0 ` or ` 1 ` | Set to ` 1 ` to enable Tensor Memory Accelerator (TMA) on Hopper or Blackwell GPUs. |
7+ | ` FLA_USE_FAST_OPS ` | ` 0 ` | ` 0 ` or ` 1 ` | Enable faster, but potentially less accurate, operations when set to ` 1 ` . |
8+ | ` FLA_CACHE_RESULTS ` | ` 1 ` | ` 0 ` or ` 1 ` | Whether to cache autotune timings to disk. Defaults to ` 1 ` (enabled). |
9+ | ` FLA_TRIL_PRECISION ` | ` ieee ` | ` ieee ` , ` tf32 ` , ` tf32x3 ` | Controls the precision for triangular operations. ` tf32x3 ` is only available on NV GPUs. |
Original file line number Diff line number Diff line change @@ -399,7 +399,7 @@ def map_triton_backend_to_torch_device() -> str:
399399is_tf32_supported = (is_nvidia and torch .cuda .get_device_capability (0 )[0 ] >= 8 )
400400is_gather_supported = hasattr (triton .language , 'gather' )
401401is_tma_supported = (is_nvidia and torch .cuda .get_device_capability (0 )[0 ] >= 9 ) \
402- and os .environ .get ('FLA_NO_USE_TMA ' , '0' ) != '1' and \
402+ and os .environ .get ('FLA_USE_TMA ' , '0' ) != '1' and \
403403 (hasattr (triton .language , '_experimental_make_tensor_descriptor' ) or hasattr (triton .language , 'make_tensor_descriptor' ))
404404
405405if is_nvidia and not is_tf32_supported :
You can’t perform that action at this time.
0 commit comments