Skip to content

Commit 2eade97

Browse files
committed
Disable TMA by default (#607)
1 parent 0b8be89 commit 2eade97

File tree

2 files changed

+8
-8
lines changed

2 files changed

+8
-8
lines changed

ENVs.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# FLA Environment Variables
22

3-
| Variable | Default | Options | Description |
4-
| --- | --- | --- | --- |
5-
| `FLA_NO_USE_TMA` | `0` | `0` or `1` | Set to `1` to disable Tensor Memory Accelerator (TMA) on Hopper or Blackwell GPUs. |
6-
| `FLA_CONV_BACKEND` | `cuda` | `triton` or `cuda` | Choose the convolution backend. `cuda` is the default and preferred for most cases. |
7-
| `FLA_USE_FAST_OPS` | `0` | `0` or `1` | Enable faster, but potentially less accurate, operations when set to `1`. |
8-
| `FLA_CACHE_RESULTS` | `1` | `0` or `1` | Whether to cache autotune timings to disk. Defaults to `1` (enabled). |
9-
| `FLA_TRIL_PRECISION` | `ieee` | `ieee`, `tf32`, `tf32x3` | Controls the precision for triangular operations. `tf32x3` is only available on NV GPUs. |
3+
| Variable | Default | Options | Description |
4+
| -------------------- | ------- | ------------------------ | ---------------------------------------------------------------------------------------- |
5+
| `FLA_CONV_BACKEND` | `cuda` | `triton` or `cuda` | Choose the convolution backend. `cuda` is the default and preferred for most cases. |
6+
| `FLA_USE_TMA` | `0` | `0` or `1` | Set to `1` to enable Tensor Memory Accelerator (TMA) on Hopper or Blackwell GPUs. |
7+
| `FLA_USE_FAST_OPS` | `0` | `0` or `1` | Enable faster, but potentially less accurate, operations when set to `1`. |
8+
| `FLA_CACHE_RESULTS` | `1` | `0` or `1` | Whether to cache autotune timings to disk. Defaults to `1` (enabled). |
9+
| `FLA_TRIL_PRECISION` | `ieee` | `ieee`, `tf32`, `tf32x3` | Controls the precision for triangular operations. `tf32x3` is only available on NV GPUs. |

fla/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -399,7 +399,7 @@ def map_triton_backend_to_torch_device() -> str:
399399
is_tf32_supported = (is_nvidia and torch.cuda.get_device_capability(0)[0] >= 8)
400400
is_gather_supported = hasattr(triton.language, 'gather')
401401
is_tma_supported = (is_nvidia and torch.cuda.get_device_capability(0)[0] >= 9) \
402-
and os.environ.get('FLA_NO_USE_TMA', '0') != '1' and \
402+
and os.environ.get('FLA_USE_TMA', '0') != '1' and \
403403
(hasattr(triton.language, '_experimental_make_tensor_descriptor') or hasattr(triton.language, 'make_tensor_descriptor'))
404404

405405
if is_nvidia and not is_tf32_supported:

0 commit comments

Comments
 (0)