Skip to content

Commit fdc0dfd

Browse files
authored
Update CHANGELOG.md (#4203)
* Update CHANGELOG.md * Update CHANGELOG.md Added missing important software updates and features. * Update CHANGELOG.md * Update CHANGELOG.md
1 parent 3d6a235 commit fdc0dfd

File tree

1 file changed

+36
-24
lines changed

1 file changed

+36
-24
lines changed

CHANGELOG.md

Lines changed: 36 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -5,43 +5,55 @@ CHANGELOG
55
------
66

77
**ENHANCEMENTS**
8-
- Add new configuration parameter `Scheduling/SlurmSettings/QueueUpdateStrategy` to allow cluster update when
9-
`SlurmQueues` configuration changes don't impact Slurm scheduler configuration.
10-
- Add support for multiple Elastic File Systems.
11-
- Add support for multiple FSx File Systems.
12-
- Add support for attaching existing FSx for Ontap and FSx for OpenZFS File Systems.
13-
- Add support for FSx Lustre Persistent_2 deployment type.
14-
- Add support for memory-based scheduling in Slurm.
15-
- Configure `RealMemory` on compute nodes by default as 95% of the EC2 memory.
16-
- Add new configuration parameter `Scheduling/SlurmSettings/EnableMemoryBasedScheduling` to configure memory-based scheduling in Slurm.
8+
- Add support for memory-based job scheduling in Slurm
9+
- Configure compute nodes real memory in the Slurm cluster configuration.
10+
- Add new configuration parameter `Scheduling/SlurmSettings/EnableMemoryBasedScheduling` to enable memory-based scheduling in Slurm.
1711
- Add new configuration parameter `Scheduling/SlurmQueues/ComputeResources/SchedulableMemory` to override default value of the memory seen by the scheduler on compute nodes.
12+
- Improve flexibility on cluster configuration updates to avoid the stop and start of the entire cluster whenever possible.
13+
- Add new configuration parameter `Scheduling/SlurmSettings/QueueUpdateStrategy` to set the preferred strategy to adopt for compute nodes needing a configuration update and replacement.
14+
- Improve failover mechanism over available compute resources when hitting insufficient capacity issues with EC2 instances. Disable compute nodes by a configurable amount of time (default 10 min) when a node launch fails due to insufficient capacity.
15+
- Add support to mount existing FSx for ONTAP and FSx for OpenZFS file systems.
16+
- Add support to mount multiple instances of existing EFS, FSx for Lustre / for ONTAP/ for OpenZFS file systems.
17+
- Add support for FSx for Lustre Persistent_2 deployment type when creating a new file system.
1818
- Prompt user to enable EFA for supported instance types when using `pcluster configure` wizard.
19-
- Change default EBS volume types from gp2 to gp3 in both the root and additional volumes.
2019
- Add support for rebooting compute nodes via Slurm.
20+
- Improved handling of Slurm power states to also account for manual powering down of nodes.
21+
- Add NVIDIA GDRCopy 2.3 into the product AMIs to enable low-latency GPU memory copy.
2122

2223
**CHANGES**
23-
- Remove support for Python 3.6.
24-
- Upgrade Slurm to version 21.08.8-2.
25-
- Do not require `PlacementGroup/Enabled` to be set to `true` when passing an existing `PlacementGroup/Id`.
24+
- Upgrade EFA installer to version 1.17.2
25+
- EFA driver: ``efa-1.16.0-1``
26+
- EFA configuration: ``efa-config-1.10-1``
27+
- EFA profile: ``efa-profile-1.5-1``
28+
- Libfabric: ``libfabric-aws-1.16.0~amzn2.0-1``
29+
- RDMA core: ``rdma-core-41.0-2``
30+
- Open MPI: ``openmpi40-aws-4.1.4-2``
31+
- Upgrade NICE DCV to version 2022.0-12760.
32+
- Upgrade NVIDIA driver to version 470.129.06.
33+
- Upgrade NVIDIA Fabric Manager to version 470.129.06.
34+
- Change default EBS volume types from gp2 to gp3 for both the root and additional volumes.
2635
- Changes to FSx for Lustre file systems created by ParallelCluster:
2736
- Change the default deployment type to `Scratch_2`.
2837
- Change the Lustre server version to `2.12`.
29-
- Add `lambda:ListTags` and `lambda:UntagResource` to `ParallelClusterUserRole` used by ParallelCluster API stack for cluster update.
30-
- Add `parallelcluster:cluster-name` tag to all resources created by ParallelCluster.
38+
- Do not require `PlacementGroup/Enabled` to be set to `true` when passing an existing `PlacementGroup/Id`.
39+
- Add `parallelcluster:cluster-name` tag to all the resources created by ParallelCluster.
3140
- Do not allow setting `PlacementGroup/Id` when `PlacementGroup/Enabled` is explicitly set to `false`.
32-
- Restrict IPv6 access to IMDS to root and cluster admin users only, when configuration parameter `HeadNode/Imds/Secured` is enabled.
33-
- Change the default root volume size from 35 GiB to the size of AMIs. The default can be overwritten in cluster configuration file.
41+
- Add `lambda:ListTags` and `lambda:UntagResource` to `ParallelClusterUserRole` used by ParallelCluster API stack for cluster update.
42+
- Restrict IPv6 access to IMDS to root and cluster admin users only, when configuration parameter `HeadNode/Imds/Secured` is true as by default.
43+
- With a custom AMI, use the AMI root volume size instead of the ParallelCluster default of 35 GiB. The value can be changed in cluster configuration file.
3444
- Automatic disabling of the compute fleet when the configuration parameter `Scheduling/SlurmQueues/ComputeResources/SpotPrice`
3545
is lower than the minimum required Spot request fulfillment price.
36-
- Show `requested_value` and `current_value` values in the change set when adding or removing a section.
37-
- Do not replace dynamic node in POWER_DOWN as jobs may be still running.
46+
- Show `requested_value` and `current_value` values in the change set when adding or removing a section during an update.
47+
- Disable `aws-ubuntu-eni-helper` service in DLAMI to avoid conflicts with `configure_nw_interface.sh` when configuring instances with multiple network cards.
48+
- Remove support for Python 3.6.
3849

3950
**BUG FIXES**
40-
- Fix default for disable validate and test components when building custom AMI. The default was to disable those components, but it wasn't effective.
41-
- Handle corner case in the scaling logic when instance is just launched and the describe instances API doesn't report yet all the EC2 info.
42-
- Dropped validation that would prevent ARM instance type to be used when `DisableSimultaneousMultithreading` was set to true.
43-
- Fix resource pattern used for the ListImagePipelineImages Action in the EcrImageDeletionLambdaRole. This is causing a stack update failure when upgrading ParallelCluster API from one version to another.
44-
- Add missing permissions needed to import/export from S3 when using FSx for Lustre via ParallelCluster API.
51+
- Fix the default behavior to skip the ParallelCluster validation and test steps when building a custom AMI.
52+
- Fix file handle leak in `computemgtd`.
53+
- Fix race condition that was sporadically causing launched instances to be immediately terminated because not available yet in EC2 DescribeInstances response
54+
- Fix support for `DisableSimultaneousMultithreading` parameter on instance types with Arm processors.
55+
- Fix ParallelCluster API stack update failure when upgrading from a previus version. Add resource pattern used for the `ListImagePipelineImages` action in the `EcrImageDeletionLambdaRole`.
56+
- Fix ParallelCluster API adding missing permissions needed to import/export from S3 when creating an FSx for Lustre storage.
4557

4658
3.1.4
4759
------

0 commit comments

Comments
 (0)