AWS ParallelCluster v3.9.0
We're excited to announce the release of AWS ParallelCluster 3.9.0
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Permit to update the external shared storage of type Efs, FsxLustre, FsxOntap, FsxOpenZfs and FileCache
without replacing compute and login fleet. - Permit to update
MinCount,MaxCount,QueueandComputeResourceconfiguration parameters without the need to
stop the compute fleet. It's now possible to update them by settingScheduling/SlurmSettings/QueueUpdateStrategy
to TERMINATE. ParallelCluster will terminate only the nodes removed during a resize of the cluster capacity
performed through a cluster update. - Add support for RHEL9.
- Add support for Rocky Linux 9 as
CustomAmicreated throughbuild-imageprocess. No public official ParallelCluster Rocky9 Linux AMI is made available at this time. - Remove
CommunicationParametersfrom the Custom Slurm Settings deny list. - Add the configuration parameter
DeploymentSettings/DefaultUserHometo allow users to move the default user's home directory to/local/homeinstead of/home(default). - Add configuration parameter
DeploymentSettings/DisableSudoAccessForDefaultUserto disable sudo access of default user in supported OSes.
CHANGES
- Upgrade Slurm to 23.11.4 (from 23.02.7).
- Upgrade Pmix to 4.2.9 (from 4.2.6).
- Add support for Python 3.11, 3.12 in pcluster CLI and aws-parallelcluster-batch-cli.
- Build network interfaces using network card index from
NetworkCardIndexlist of EC2 DescribeInstances response,
instead of looping overMaximumNetworkCardsrange. - Fail cluster creation when using instance types P3, G3, P2 and G2 because their GPU architecture is not compatible with Open Source Nvidia Drivers (OpenRM) introduced as part of 3.8.0 release.
- Upgrade the default FSx Lustre server version managed by ParallelCluster to 2.15.
- Upgrade NVIDIA driver to version 535.154.05.
- Upgrade EFA installer to
1.30.0.- Efa-driver:
efa-2.6.0-1 - Efa-config:
efa-config-1.15-1 - Efa-profile:
efa-profile-1.6-1 - Libfabric-aws:
libfabric-aws-1.19.0 - Rdma-core:
rdma-core-46.0-1 - Open MPI:
openmpi40-aws-4.1.6-2andopenmpi50-aws-5.0.0-11
- Efa-driver:
- Upgrade NICE DCV to version
2023.1-16388.- server:
2023.1.16388-1 - xdcv:
2023.1.565-1 - gl:
2023.1.1047-1 - web_viewer:
2023.1.16388-1
- server:
- Upgrade ARM PL to version 23.10.
- Upgrade third-party cookbook dependencies:
- nfs-5.1.2 (from nfs-5.0.0)
BUG FIXES
- Refactor IAM policies defined in CloudFormation template
parallelclutser-policies.yamlto prevent ParallelCluster API deployment failure caused by policies exceeding IAM limits. - Fix issue making job fail when submitted as active directory user from login nodes. The issue was caused by an incomplete configuration of the integration with the external Active Directory on the head node.
- Fix issue making login nodes fail to bootstrap when the head node takes more time than expected in writing keys.