|
2 | 2 | CHANGELOG |
3 | 3 | ========= |
4 | 4 |
|
| 5 | +2.6.1 |
| 6 | +===== |
| 7 | + |
| 8 | +**ENHANCEMENTS** |
| 9 | + |
| 10 | +* Improved management of S3 bucket that gets created when ``awsbatch`` scheduler is selected. |
| 11 | +* Add validation for supported OSes when using FSx Lustre. |
| 12 | +* Change ProctrackType from proctrack/gpid to proctrack/cgroup in Slurm in order to better handle termination of |
| 13 | + stray processes when running MPI applications. This also includes the creation of a cgroup Slurm configuration in |
| 14 | + in order to enable the cgroup plugin. |
| 15 | +* Skip execution, at node bootstrap time, of all those install recipes that are already applied at AMI creation time. |
| 16 | +* Start CloudWatch agent earlier in the node bootstrapping phase so that cookbook execution failures are correctly |
| 17 | + uploaded and are available for troubleshooting. |
| 18 | +* Improved the management of SQS messages and retries to speed-up recovery times when failures occur. |
| 19 | + |
| 20 | +**CHANGES** |
| 21 | + |
| 22 | +* FSx Lustre: remove ``x-systemd.requires=lnet.service`` from mount options in order to rely on default lnet setup |
| 23 | + provided by Lustre. |
| 24 | +* Enforce Packer version to be >= 1.4.0 when building an AMI. This is also required for customers using `pcluster |
| 25 | + createami` command. |
| 26 | +* Do not launch a replacement for an unhealthy or unresponsive node until this is terminated. This makes cluster slower |
| 27 | + at provisioning new nodes when failures occur but prevents any temporary over-scaling with respect to the expected |
| 28 | + capacity. |
| 29 | +* Increase parallelism when starting ``slurmd`` on compute nodes that join the cluster from 10 to 30. |
| 30 | +* Reduce the verbosity of messages logged by the node daemons. |
| 31 | +* Do not dump logs to `/home/logs` when nodewatcher encounters a failure and terminates the node. CloudWatch can be |
| 32 | + used to debug such failures. |
| 33 | +* Reduce the number of retries for failed REMOVE events in sqswatcher. |
| 34 | + |
| 35 | +**BUG FIXES** |
| 36 | + |
| 37 | +* Configure proxy during cloud-init boothook in order for the proxy to be configured for all bootstrap actions. |
| 38 | +* Fix installation of Intel Parallel Studio XE Runtime that requires yum4 since version 2019.5. |
| 39 | +* Fix compilation of Torque scheduler on Ubuntu 18.04. |
| 40 | +* Fixed a bug in the ordering and retrying of SQS messages that was causing, under certain circumstances of heavy load, |
| 41 | + the scheduler configuration to be left in an inconsistent state. |
| 42 | +* Delete from queue the REMOVE events that are discarded due to hostname collision with another event fetched as part |
| 43 | + of the same ``sqswatcher`` iteration. |
| 44 | + |
| 45 | + |
5 | 46 | 2.6.0 |
6 | 47 | ===== |
7 | 48 |
|
|
0 commit comments