You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+15-8Lines changed: 15 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,11 +7,17 @@ This file is used to list changes made in each version of the AWS ParallelCluste
7
7
------
8
8
9
9
**ENHANCEMENTS**
10
-
- Remove UnkillableStepTimeout from slurm.conf and let slurm set this value.
11
-
- Add `build-image` support for kernel 6.12 of Amazon Linux 2023. The official ParallelCluster Amazon Linux 2023 AMIs use kernel 6.12.
10
+
- Add support for P6e-GB200 instances. ParallelCluster sets up Slurm topology plugin to handle P6e-GB200 UltraServers. See limitations section for important additional setup requirements.
11
+
- Add `build-image` support for Amazon Linux 2023 AMIs based on kernel 6.12 (in addition to 6.1).
12
+
13
+
**LIMITATIONS**
14
+
- P6e-GB200 instances are only tested on Amazon Linux 2023, Ubuntu 22.04 and Ubuntu 24.04.
15
+
- Using IMEX on P6e-GB200 requires additional setup. Please refer to <PLACE_HOLDER for the tutorial link>.
12
16
13
17
**CHANGES**
14
-
- Ubuntu 20.04 is no longer supported.
18
+
- Install nvidia-imex for all OSs except AL2.
19
+
- Remove `berkshelf`. All cookbooks are local and do not need `berkshelf` dependency management.
20
+
- Remove `UnkillableStepTimeout` from slurm.conf and let slurm set this value.
15
21
- Upgrade Slurm to version 24.11.6 (from 24.05.8).
16
22
- Upgrade EFA installer to 1.43.2 (from 1.41.0).
17
23
- Efa-driver: efa-2.17.2-1
@@ -20,21 +26,22 @@ This file is used to list changes made in each version of the AWS ParallelCluste
20
26
- Libfabric-aws: libfabric-aws-2.1.0-5
21
27
- Rdma-core: rdma-core-58.0-1
22
28
- Open MPI: openmpi40-aws-4.1.7-2 and openmpi50-aws-5.0.6-11
23
-
- Upgrade Cinc Client to version to 18.4.12 from 18.2.7.
29
+
- Upgrade Cinc Client to version 18.4.12 (from 18.2.7).
24
30
- Upgrade NVIDIA driver to version 570.172.08 (from 570.86.15) for all OSs except AL2.
25
31
- Upgrade CUDA Toolkit to version 12.8.1 (from 12.8.0) for all OSs except AL2.
26
32
- Upgrade DCGM to version 4.2.3 (from 3.3.6) for all OSs except AL2.
27
33
- Upgrade Python to 3.12.11 (from 3.12.8) for all OSs except AL2.
28
34
- Upgrade Python to 3.9.23 (from 3.9.20) for AL2.
29
35
- Upgrade Intel MPI Library to 2021.16.0 (from 2021.13.1).
30
-
- Addressed cluster id mismatch known issue by deleting the file `/var/spool/slurm.state/clustername` before configuring Slurm accounting.
31
36
- Upgrade DCV to version 2024.0-19030.
32
-
- Remove `berkshelf`. All cookbooks are local and do not need `berkshelf` dependency management.
33
-
- Add support for GB200 instance types.
34
-
- Install nvidia-imex for all OSs except AL2.
37
+
- Upgrade the official ParallelCluster Amazon Linux 2023 AMIs to kernel 6.12 (from 6.1).
35
38
36
39
**BUG FIXES**
37
40
- Fix a race condition in CloudWatch Agent startup that could cause nodes bootstrap failures.
41
+
- Fix cluster id mismatch issue by deleting the file `/var/spool/slurm.state/clustername` before configuring Slurm accounting.
0 commit comments