Skip to content

Commit 58c3605

Browse files
author
Manas Vardhan
authored
docs: update low-precision training docs to reflect MS-AMP deprecation (#3929)
- Add prominent deprecation warnings for MS-AMP in both usage and concept guides - Note specific compatibility issues (CUDA 12.x+, modern NCCL, PyTorch 2.2+) - Recommend TransformersEngine and torchao as actively maintained alternatives - Update code examples to prefer TE/torchao over MS-AMP - Reorder Further Reading links to de-emphasize MS-AMP Closes #3639
1 parent 2d2b440 commit 58c3605

File tree

2 files changed

+24
-11
lines changed

2 files changed

+24
-11
lines changed

docs/source/concept_guides/low_precision_training.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ rendered properly in your Markdown viewer.
1616
# Low precision training methods
1717

1818
The release of new kinds of hardware led to the emergence of new training paradigms that better utilize them. Currently, this is in the form of training
19-
in 8-bit precision using packages such as [TransformersEngine](https://github.com/NVIDIA/TransformerEngine) (TE) or [MS-AMP](https://github.com/Azure/MS-AMP/tree/main).
19+
in 8-bit precision using packages such as [TransformersEngine](https://github.com/NVIDIA/TransformerEngine) (TE), [torchao](https://github.com/pytorch/ao) (native PyTorch FP8), or the legacy [MS-AMP](https://github.com/Azure/MS-AMP/tree/main) (no longer maintained, see warning below).
2020

2121
For an introduction to the topics discussed today, we recommend reviewing the [low-precision usage guide](../usage_guides/low_precision_training) as this documentation will reference it regularly.
2222

@@ -63,7 +63,7 @@ If we notice in the chart mentioned earlier, TE simply casts the computation lay
6363

6464
<Tip warning={true}>
6565

66-
MS-AMP is no longer actively maintained and has known compatibility issues with newer CUDA versions (12.x+) and PyTorch builds. We recommend using `TransformersEngine` or `torchao` instead for FP8 training.
66+
**⚠️ Deprecated / Unmaintained:** MS-AMP is no longer actively maintained by Microsoft. The repository has not seen updates since 2023 and has known compatibility issues with CUDA 12.x+, modern NCCL versions, and recent PyTorch releases (2.2+). **We strongly recommend using `TransformersEngine` or `torchao` instead.** See the [usage guide](../usage_guides/low_precision_training) for migration instructions.
6767

6868
</Tip>
6969

@@ -77,4 +77,10 @@ MS-AMP takes a different approach to `TransformersEngine` by providing three dif
7777

7878
## Combining the two
7979

80+
<Tip warning={true}>
81+
82+
Since MS-AMP is no longer maintained, this combination is not recommended for new projects.
83+
84+
</Tip>
85+
8086
More experiments need to be performed but it's been noted that combining both MS-AMP and TransformersEngine can lead to the highest throughput by relying on NVIDIA's optimized FP8 operators and utilizing how MS-AMP reduces the memory overhead.

docs/source/usage_guides/low_precision_training.md

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ What this will result in is some reduction in the memory used (as we've cut the
3030

3131
## Configuring the Accelerator
3232

33-
Currently three different backends for FP8 are supported (`TransformersEngine`, `torchao`, and `MS-AMP`), each with different capabilities and configurations.
33+
Currently two actively maintained backends for FP8 are supported (`TransformersEngine` and `torchao`), each with different capabilities and configurations. A legacy `MS-AMP` backend also exists but is no longer recommended (see [below](#configuring-ms-amp) for details).
3434

3535
To use either, the same core API is used. Just pass `mixed_precision="fp8"` to either the [`Accelerator`], during `accelerate config` when prompted about mixed precision, or as part of your `config.yaml` file in the `mixed_precision` key:
3636

@@ -43,10 +43,9 @@ To specify a backend (and customize other parts of the FP8 mixed precision setup
4343

4444
```{python}
4545
from accelerate import Accelerator
46-
from accelerate.utils import MSAMPRecipeKwargs
47-
kwargs = [MSAMPRecipeKwargs()]
48-
# Or to specify the backend as `TransformersEngine` even if MS-AMP is installed
49-
# kwargs = [TERecipeKwargs()]
46+
from accelerate.utils import TERecipeKwargs, AORecipeKwargs
47+
# Use TransformersEngine
48+
kwargs = [TERecipeKwargs()]
5049
# Or to use torchao
5150
# kwargs = [AORecipeKwargs()]
5251
accelerator = Accelerator(mixed_precision="fp8", kwarg_handlers=kwargs)
@@ -69,11 +68,19 @@ fp8_config:
6968

7069
<Tip warning={true}>
7170

72-
MS-AMP is no longer actively maintained and has known compatibility issues with newer CUDA versions (12.x+) and PyTorch builds. We recommend using `TransformersEngine` or `torchao` instead for FP8 training.
71+
**⚠️ Deprecated / Unmaintained:** MS-AMP is no longer actively maintained by Microsoft. The [MS-AMP repository](https://github.com/Azure/MS-AMP) has not received updates since 2023 and has known compatibility issues:
72+
73+
- Requires CUDA 11.x (does not support CUDA 12.x+)
74+
- Requires older NCCL versions incompatible with recent PyTorch releases
75+
- Does not support recent PyTorch versions (2.2+)
76+
77+
**We strongly recommend using [`TransformersEngine`](#configuring-transformersengine) or [`torchao`](#configuring-torchao) instead for all new and existing FP8 training workflows.** Both are actively maintained and support modern CUDA/PyTorch versions. Native PyTorch FP8 support via `torchao` is particularly promising as a vendor-neutral solution.
78+
79+
The MS-AMP backend is retained in Accelerate for legacy compatibility but may be removed in a future release.
7380

7481
</Tip>
7582

76-
Of the two, `MS-AMP` is traditionally the easier one to configure as there is only a single argument: the optimization level.
83+
`MS-AMP` has a single configuration argument: the optimization level.
7784

7885
Currently two levels of optimization are supported in the Accelerate integration, `"O1"` and `"O2"` (using the letter 'o', not zero).
7986

@@ -205,7 +212,7 @@ Find out more [here](https://github.com/huggingface/accelerate/tree/main/benchma
205212

206213
To learn more about training in FP8 please check out the following resources:
207214

208-
* [Our concept guide](../concept_guides/low_precision_training) detailing into more about both TransformersEngine and MS-AMP
215+
* [Our concept guide](../concept_guides/low_precision_training) detailing into more about TransformersEngine, torchao, and MS-AMP
209216
* [The `transformers-engine` documentation](https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/api/common.html)
210-
* [The `MS-AMP` documentation](https://azure.github.io/MS-AMP/docs/)
211217
* [The `torchao` documentation](https://github.com/pytorch/ao/tree/main/torchao/float8)
218+
* [The `MS-AMP` documentation](https://azure.github.io/MS-AMP/docs/) (⚠️ no longer maintained)

0 commit comments

Comments
 (0)