@@ -5,9 +5,68 @@ Release Notes
55New features, bug fixes, and improvements are regularly made to the SageMaker
66distributed model parallel library.
77
8- SageMaker Distributed Model Parallel 1.8.0 Release Notes
8+ SageMaker Distributed Model Parallel 1.8.1 Release Notes
99========================================================
1010
11+ *Date: April. 23. 2022 *
12+
13+ **New Features **
14+
15+ * Added support for more configurations of the Hugging Face Transformers GPT-2 and GPT-J models
16+ with tensor parallelism: ``scale_attn_weights ``, ``scale_attn_by_inverse_layer_idx ``,
17+ ``reorder_and_upcast_attn ``. To learn more about these features, please refer to
18+ the following model configuration classes
19+ in the *Hugging Face Transformers documentation *:
20+
21+ * `transformers.GPT2Config <https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2Config >`_
22+ * `transformers.GPTJConfig <https://huggingface.co/docs/transformers/model_doc/gptj#transformers.GPTJConfig >`_
23+
24+ * Added support for activation checkpointing of modules which pass keyword value arguments
25+ and arbitrary structures in their forward methods. This helps support
26+ activation checkpointing with Hugging Face Transformers models even
27+ when tensor parallelism is not enabled.
28+
29+ **Bug Fixes **
30+
31+ * Fixed a correctness issue with tensor parallelism for GPT-J model
32+ which was due to improper scaling during gradient reduction
33+ for some layer normalization modules.
34+ * Fixed the creation of unnecessary additional processes which take up some
35+ GPU memory on GPU 0 when the :class: `smp.allgather ` collective is called.
36+
37+ **Improvements **
38+
39+ * Improved activation offloading so that activations are preloaded on a
40+ per-layer basis as opposed to all activations for a micro batch earlier.
41+ This not only improves memory efficiency and performance, but also makes
42+ activation offloading a useful feature for non-pipeline parallelism cases.
43+
44+ **Migration to AWS Deep Learning Containers **
45+
46+ This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers:
47+
48+ * HuggingFace 4.17.0 DLC with PyTorch 1.10.2
49+
50+ .. code ::
51+
52+ 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-training:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04
53+
54+
55+ * The binary file of this version of the library for custom container users
56+
57+ .. code ::
58+
59+ https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.10.0/build-artifacts/2022-04-14-03-58/smdistributed_modelparallel-1.8.1-cp38-cp38-linux_x86_64.whl
60+
61+
62+ ----
63+
64+ Release History
65+ ===============
66+
67+ SageMaker Distributed Model Parallel 1.8.0 Release Notes
68+ --------------------------------------------------------
69+
1170*Date: March. 23. 2022 *
1271
1372**New Features **
@@ -32,18 +91,13 @@ This version passed benchmark testing and is migrated to the following AWS Deep
3291 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04
3392
3493
35- The binary file of this version of the library for custom container users:
94+ * The binary file of this version of the library for custom container users
3695
3796 .. code ::
3897
3998 https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.10.0/build-artifacts/2022-03-12-00-33/smdistributed_modelparallel-1.8.0-cp38-cp38-linux_x86_64.whl
4099
41100
42- ----
43-
44- Release History
45- ===============
46-
47101 SageMaker Distributed Model Parallel 1.7.0 Release Notes
48102--------------------------------------------------------
49103
0 commit comments