You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: update low-precision training docs to reflect MS-AMP deprecation (#3929)
- Add prominent deprecation warnings for MS-AMP in both usage and concept guides
- Note specific compatibility issues (CUDA 12.x+, modern NCCL, PyTorch 2.2+)
- Recommend TransformersEngine and torchao as actively maintained alternatives
- Update code examples to prefer TE/torchao over MS-AMP
- Reorder Further Reading links to de-emphasize MS-AMP
Closes#3639
Copy file name to clipboardExpand all lines: docs/source/concept_guides/low_precision_training.md
+8-2Lines changed: 8 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ rendered properly in your Markdown viewer.
16
16
# Low precision training methods
17
17
18
18
The release of new kinds of hardware led to the emergence of new training paradigms that better utilize them. Currently, this is in the form of training
19
-
in 8-bit precision using packages such as [TransformersEngine](https://github.com/NVIDIA/TransformerEngine) (TE)or [MS-AMP](https://github.com/Azure/MS-AMP/tree/main).
19
+
in 8-bit precision using packages such as [TransformersEngine](https://github.com/NVIDIA/TransformerEngine) (TE), [torchao](https://github.com/pytorch/ao) (native PyTorch FP8), or the legacy [MS-AMP](https://github.com/Azure/MS-AMP/tree/main) (no longer maintained, see warning below).
20
20
21
21
For an introduction to the topics discussed today, we recommend reviewing the [low-precision usage guide](../usage_guides/low_precision_training) as this documentation will reference it regularly.
22
22
@@ -63,7 +63,7 @@ If we notice in the chart mentioned earlier, TE simply casts the computation lay
63
63
64
64
<Tipwarning={true}>
65
65
66
-
MS-AMP is no longer actively maintained and has known compatibility issues with newer CUDA versions (12.x+) and PyTorch builds. We recommend using `TransformersEngine` or `torchao` insteadfor FP8 training.
66
+
**⚠️ Deprecated / Unmaintained:**MS-AMP is no longer actively maintained by Microsoft. The repository has not seen updates since 2023 and has known compatibility issues with CUDA 12.x+, modern NCCL versions, and recent PyTorch releases (2.2+). **We strongly recommend using `TransformersEngine` or `torchao` instead.** See the [usage guide](../usage_guides/low_precision_training)for migration instructions.
67
67
68
68
</Tip>
69
69
@@ -77,4 +77,10 @@ MS-AMP takes a different approach to `TransformersEngine` by providing three dif
77
77
78
78
## Combining the two
79
79
80
+
<Tipwarning={true}>
81
+
82
+
Since MS-AMP is no longer maintained, this combination is not recommended for new projects.
83
+
84
+
</Tip>
85
+
80
86
More experiments need to be performed but it's been noted that combining both MS-AMP and TransformersEngine can lead to the highest throughput by relying on NVIDIA's optimized FP8 operators and utilizing how MS-AMP reduces the memory overhead.
Copy file name to clipboardExpand all lines: docs/source/usage_guides/low_precision_training.md
+16-9Lines changed: 16 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ What this will result in is some reduction in the memory used (as we've cut the
30
30
31
31
## Configuring the Accelerator
32
32
33
-
Currently three different backends for FP8 are supported (`TransformersEngine`, `torchao`, and `MS-AMP`), each with different capabilities and configurations.
33
+
Currently two actively maintained backends for FP8 are supported (`TransformersEngine`and `torchao`), each with different capabilities and configurations. A legacy `MS-AMP` backend also exists but is no longer recommended (see [below](#configuring-ms-amp) for details).
34
34
35
35
To use either, the same core API is used. Just pass `mixed_precision="fp8"` to either the [`Accelerator`], during `accelerate config` when prompted about mixed precision, or as part of your `config.yaml` file in the `mixed_precision` key:
36
36
@@ -43,10 +43,9 @@ To specify a backend (and customize other parts of the FP8 mixed precision setup
43
43
44
44
```{python}
45
45
from accelerate import Accelerator
46
-
from accelerate.utils import MSAMPRecipeKwargs
47
-
kwargs = [MSAMPRecipeKwargs()]
48
-
# Or to specify the backend as `TransformersEngine` even if MS-AMP is installed
49
-
# kwargs = [TERecipeKwargs()]
46
+
from accelerate.utils import TERecipeKwargs, AORecipeKwargs
MS-AMP is no longer actively maintained and has known compatibility issues with newer CUDA versions (12.x+) and PyTorch builds. We recommend using `TransformersEngine` or `torchao` instead for FP8 training.
71
+
**⚠️ Deprecated / Unmaintained:** MS-AMP is no longer actively maintained by Microsoft. The [MS-AMP repository](https://github.com/Azure/MS-AMP) has not received updates since 2023 and has known compatibility issues:
72
+
73
+
- Requires CUDA 11.x (does not support CUDA 12.x+)
74
+
- Requires older NCCL versions incompatible with recent PyTorch releases
75
+
- Does not support recent PyTorch versions (2.2+)
76
+
77
+
**We strongly recommend using [`TransformersEngine`](#configuring-transformersengine) or [`torchao`](#configuring-torchao) instead for all new and existing FP8 training workflows.** Both are actively maintained and support modern CUDA/PyTorch versions. Native PyTorch FP8 support via `torchao` is particularly promising as a vendor-neutral solution.
78
+
79
+
The MS-AMP backend is retained in Accelerate for legacy compatibility but may be removed in a future release.
73
80
74
81
</Tip>
75
82
76
-
Of the two, `MS-AMP`is traditionally the easier one to configure as there is only a single argument: the optimization level.
83
+
`MS-AMP`has a single configuration argument: the optimization level.
77
84
78
85
Currently two levels of optimization are supported in the Accelerate integration, `"O1"` and `"O2"` (using the letter 'o', not zero).
79
86
@@ -205,7 +212,7 @@ Find out more [here](https://github.com/huggingface/accelerate/tree/main/benchma
205
212
206
213
To learn more about training in FP8 please check out the following resources:
207
214
208
-
*[Our concept guide](../concept_guides/low_precision_training) detailing into more about both TransformersEngine and MS-AMP
215
+
*[Our concept guide](../concept_guides/low_precision_training) detailing into more about TransformersEngine, torchao, and MS-AMP
0 commit comments