Skip to content

Commit 1c53ad5

Browse files
Merge pull request #1128 from aws/develop
Merge Release 2.4.0
2 parents 47b8751 + 4bf3cfb commit 1c53ad5

File tree

79 files changed

+2212
-929
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

79 files changed

+2212
-929
lines changed

.github/no-response.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Configuration for probot-no-response - https://github.com/probot/no-response
22

33
# Number of days of inactivity before an Issue is closed for lack of response
4-
daysUntilClose: 14
4+
daysUntilClose: 7
55
# Label requiring a response
66
responseRequiredLabel: closing-soon-if-no-response
77
# Comment to post when closing an Issue for lack of response. Set to `false` to disable

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,3 +13,4 @@ build/
1313
.coverage
1414
assets/
1515
report.html
16+
tests_outputs/

CHANGELOG.rst

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,72 @@
22
CHANGELOG
33
=========
44

5+
2.4.0
6+
=====
7+
8+
**ENHANCEMENTS**
9+
10+
* Add support for EFA on Centos 7, Amazon Linux and Ubuntu 1604
11+
* Add support for Ubuntu in China region ``cn-northwest-1``
12+
13+
* SGE:
14+
15+
* process nodes added to or removed from the cluster in batches in order to speed up cluster scaling.
16+
* scale up only if required slots/nodes can be satisfied
17+
* scale down if pending jobs have unsatisfiable CPU/nodes requirements
18+
* add support for jobs in hold/suspended state (this includes job dependencies)
19+
* automatically terminate and replace faulty or unresponsive compute nodes
20+
* add retries in case of failures when adding or removing nodes
21+
* configure scheduler to handle rescheduling and cancellation of jobs running on failing or terminated nodes
22+
23+
* Slurm:
24+
25+
* scale up only if required slots/nodes can be satisfied
26+
* scale down if pending jobs have unsatisfiable CPU/nodes requirements
27+
* automatically terminate and replace faulty or unresponsive compute nodes
28+
* decrease SlurmdTimeout to 120 seconds to speed up replacement of faulty nodes
29+
30+
* Automatically replace compute instances that fail initialization and dump logs to shared home directory.
31+
* Dynamically fetch compute instance type and cluster size in order to support updates in scaling daemons
32+
* Always use full master FQDN when mounting NFS on compute nodes. This solves some issues occurring with some networking
33+
setups and custom DNS configurations
34+
* List the version and status during ``pcluster list``
35+
* Remove double quoting of the post_install args
36+
* ``awsbsub``: use override option to set the number of nodes rather than creating multiple JobDefinitions
37+
* Add support for AWS_PCLUSTER_CONFIG_FILE env variable to specify pcluster config file
38+
39+
**CHANGES**
40+
41+
* Update openmpi library to version 3.1.4 on Centos 7, Amazon Linux and Ubuntu 1604. This also changes the default
42+
openmpi path to ``/opt/amazon/efa/bin/`` and the openmpi module name to ``openmpi/3.1.4``
43+
* Set soft and hard ulimit on open files to 10000 for all supported OSs
44+
* For a better security posture, we're removing AWS credentials from the ``parallelcluster`` config file
45+
Credentials can be now setup following the canonical procedure used for the aws cli
46+
* When using FSx or EFS do not enforce in sanity check that the compute security group is open to 0.0.0.0/0
47+
* When updating an existing cluster, the same template version is now used, no matter the pcluster cli version
48+
* SQS messages that fail to be processed in ``sqswatcher`` are now re-queued only 3 times and not forever
49+
* Reset ``nodewatcher`` idletime to 0 when the host becomes essential for the cluster (because of min size of ASG or
50+
because there are pending jobs in the scheduler queue)
51+
* SGE: a node is considered as busy when in one of the following states "u", "C", "s", "d", "D", "E", "P", "o".
52+
This allows a quick replacement of the node without waiting for the ``nodewatcher`` to terminate it.
53+
* Do not update DynamoDB table on cluster updates in order to avoid hitting strict API limits (1 update per day).
54+
55+
**BUG FIXES**
56+
57+
* Fix issue that was preventing Torque from being used on Centos 7
58+
* Start node daemons at the end of instance initialization. The time spent for post-install script and node
59+
initialization is not counted as part of node idletime anymore.
60+
* Fix issue which was causing an additional and invalid EBS mount point to be added in case of multiple EBS
61+
* Install Slurm libpmpi/libpmpi2 that is distributed in a separate package since Slurm 17
62+
* ``pcluster ssh`` command now works for clusters with ``use_public_ips = false``
63+
* Slurm: add "BeginTime", "NodeDown", "Priority" and "ReqNodeNotAvail" to the pending reasons that trigger
64+
a cluster scaling
65+
* Add a timeout on remote commands execution so that the daemons are not stuck if the compute node is unresponsive
66+
* Fix an edge case that was causing the ``nodewatcher`` to hang forever in case the node had become essential to the
67+
cluster during a call to ``self_terminate``.
68+
* Fix ``pcluster start/stop`` commands when used with an ``awsbatch`` cluster
69+
70+
571
2.3.1
672
=====
773

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,6 @@ If you discover a potential security issue in this project we ask that you notif
5656

5757
## Licensing
5858

59-
See the [LICENSE](https://github.com/aws/aws-parallelcluster/blob/develop/LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
59+
See the [LICENSE](https://github.com/aws/aws-parallelcluster/blob/develop/LICENSE.txt) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
6060

6161
We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes.

README.rst

Lines changed: 4 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,6 @@ Then, run pcluster configure:
4141
4242
$ pcluster configure
4343
Cluster Template [default]:
44-
AWS Access Key ID []:
45-
AWS Secret Access Key ID []:
4644
Acceptable Values for AWS Region ID:
4745
ap-south-1
4846
...
@@ -105,23 +103,8 @@ HPC forum which may be helpful:https://forums.aws.amazon.com/forum.jspa?forumID=
105103
Changes
106104
-------
107105

108-
CfnCluster 1.6 IAM Change
109-
=========================
110-
Between CfnCluster 1.5.4 and 1.6.0 we made a change to the CfnClusterInstancePolicy that adds “s3:GetObject” permissions
111-
on objects in <REGION>-cfncluster bucket, "autoscaling:SetDesiredCapacity", "autoscaling:DescribeTags" permissions and
112-
"cloudformation:DescribeStacks" permissions on <REGION>:<ACCOUNT_ID>:stack/cfncluster-*.
106+
CfnCluster to AWS ParallelCluster
107+
=================================
108+
In Version `2.0.0`, we changed the name of CfnCluster to AWS ParallelCluster. With that name change we released several new features, which you can read about in the `Change Log`_.
113109

114-
If you’re using a custom policy (e.g. you specify "ec2_iam_role" in your config) be sure it includes this new permission. See https://aws-parallelcluster.readthedocs.io/en/latest/iam.html
115-
116-
CfnCluster 1.5 IAM Change
117-
=========================
118-
Between CfnCluster 1.4.2 and 1.5.0 we made a change to the CfnClusterInstancePolicy that adds “ec2:DescribeVolumes” permissions. If you’re using a custom policy (e.g. you specify "ec2_iam_role" in your config) be sure it includes this new permission. See https://aws-parallelcluster.readthedocs.io/en/latest/iam.html
119-
120-
CfnCluster 1.2 and Earlier
121-
==========================
122-
123-
For various security (on our side) and maintenance reasons, CfnCluster
124-
1.2 and earlier have been deprecated. AWS-side resources necessary to
125-
create a cluster with CfnCluster 1.2 or earlier are no longer
126-
available. Existing clusters will continue to operate, but new
127-
clusters can not be created.
110+
.. _`Change Log`: https://github.com/aws/aws-parallelcluster/blob/develop/CHANGELOG.rst#200

amis.txt

Lines changed: 97 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -1,100 +1,102 @@
11
# alinux
2-
ap-northeast-1: ami-0af8c1a29f58c3b91
3-
ap-northeast-2: ami-036c289fda8701f9d
4-
ap-northeast-3: ami-000902aa3082732ce
5-
ap-south-1: ami-00ff6216daa4b0a69
6-
ap-southeast-1: ami-03b015a13daa9ff8d
7-
ap-southeast-2: ami-0c2528255cc7c4cec
8-
ca-central-1: ami-05bad5df22b9502e5
9-
cn-north-1: ami-0227bedfc6798cba1
10-
cn-northwest-1: ami-08143603c5f390f20
11-
eu-central-1: ami-003262ea853b26050
12-
eu-north-1: ami-06cac8aed0729f14c
13-
eu-west-1: ami-0691d6d6d4d209e09
14-
eu-west-2: ami-0d241a5c57ee3421d
15-
eu-west-3: ami-0e59dd1d2794a857c
16-
sa-east-1: ami-07b044055a13cf93e
17-
us-east-1: ami-0f8b01b1377483305
18-
us-east-2: ami-049afa5b53a7880d8
19-
us-gov-east-1: ami-02ee5c66a10526bd1
20-
us-gov-west-1: ami-7da7d01c
21-
us-west-1: ami-02c87842ea944292e
22-
us-west-2: ami-09b457d5cba24514a
2+
ap-northeast-1: ami-0dcc18768374b4441
3+
ap-northeast-2: ami-022e7c66ccb807c9f
4+
ap-northeast-3: ami-04402be7b85999df8
5+
ap-south-1: ami-0a14b1f0e7427a4bb
6+
ap-southeast-1: ami-02079735c20c1ac4e
7+
ap-southeast-2: ami-0c65952cdec26ae39
8+
ca-central-1: ami-01f28f8381746746f
9+
cn-north-1: ami-0da67c26ce2e8d111
10+
cn-northwest-1: ami-03dc8f759de9de690
11+
eu-central-1: ami-0ff6d2a86b9199e82
12+
eu-north-1: ami-0cb08caa10d113ed7
13+
eu-west-1: ami-0b5c32b12b9c340d0
14+
eu-west-2: ami-0c218c2aaa7185f03
15+
eu-west-3: ami-011e0eee21d52f23e
16+
sa-east-1: ami-0d154ae55458941fd
17+
us-east-1: ami-0d130bdfab2037f8a
18+
us-east-2: ami-00d2a10466c577ac7
19+
us-gov-east-1: ami-0f5003922daf22962
20+
us-gov-west-1: ami-ba83fbdb
21+
us-west-1: ami-0b6f7961ee845966e
22+
us-west-2: ami-0d611d90619419e93
2323
# centos6
24-
ap-northeast-1: ami-0476984f547d1f4f2
25-
ap-northeast-2: ami-06ecb1e81881cd450
26-
ap-northeast-3: ami-04d195b55ddf56228
27-
ap-south-1: ami-0b1abd2bf8810487c
28-
ap-southeast-1: ami-0576b4b2db8272abf
29-
ap-southeast-2: ami-09a18baab0a142123
30-
ca-central-1: ami-0aa03a3f1b737c651
31-
eu-central-1: ami-092bd9c46746d940b
32-
eu-north-1: ami-07b83433077d8345b
33-
eu-west-1: ami-09880c7e25df69af8
34-
eu-west-2: ami-0eba961d9f30431b2
35-
eu-west-3: ami-0d0b243ac76765544
36-
sa-east-1: ami-0dfdc6ab8bf7935ea
37-
us-east-1: ami-00f71e3be938f3077
38-
us-east-2: ami-0b29637d31cf774aa
39-
us-west-1: ami-08dc392067bcf9807
40-
us-west-2: ami-0fa309858f6ce66ee
24+
ap-northeast-1: ami-086781b933db101a5
25+
ap-northeast-2: ami-07d646c87d889d816
26+
ap-northeast-3: ami-082ece6e5fe8f6fd1
27+
ap-south-1: ami-02389426198baf430
28+
ap-southeast-1: ami-02105387481bd0ad0
29+
ap-southeast-2: ami-0050fad9761b3957c
30+
ca-central-1: ami-0e70755a47200df23
31+
eu-central-1: ami-03979ebb9cfee2ccc
32+
eu-north-1: ami-085a9ecbf9f64f65b
33+
eu-west-1: ami-070ba56e38a744df5
34+
eu-west-2: ami-08553013e6e986028
35+
eu-west-3: ami-0afff5bc147c847e0
36+
sa-east-1: ami-0635a9bdc378fe67f
37+
us-east-1: ami-091f37e900368fe1a
38+
us-east-2: ami-055404b3df678da86
39+
us-west-1: ami-0e438402399c457d7
40+
us-west-2: ami-0651b7e7cfde4b3a0
4141
# centos7
42-
ap-northeast-1: ami-0f13f45e966236e46
43-
ap-northeast-2: ami-016c726d8902d133c
44-
ap-northeast-3: ami-037c3a13cd142c8f8
45-
ap-south-1: ami-06b7212503b9d9637
46-
ap-southeast-1: ami-0c39937e9ae643ecd
47-
ap-southeast-2: ami-0164dbfb6b7b938f5
48-
ca-central-1: ami-0ee7cb4d2673e78de
49-
eu-central-1: ami-0bcced571d9cc0142
50-
eu-north-1: ami-00255a59ce6bd8147
51-
eu-west-1: ami-00c07933e0ea22f7d
52-
eu-west-2: ami-09aa34259643c50eb
53-
eu-west-3: ami-04ce6f74e1070a795
54-
sa-east-1: ami-0a625e9dcf563db57
55-
us-east-1: ami-0658a809b3e89b0c9
56-
us-east-2: ami-07cef254f8886ea4e
57-
us-west-1: ami-0454b933360a077e4
58-
us-west-2: ami-03b7e311ae2f4aacb
42+
ap-northeast-1: ami-09bae677f8f58842d
43+
ap-northeast-2: ami-0eeb6c96d0e6c2d90
44+
ap-northeast-3: ami-084c0dbc04f722758
45+
ap-south-1: ami-031f8f67a53de53fe
46+
ap-southeast-1: ami-041ca5c2f5b748966
47+
ap-southeast-2: ami-06c7f5584ecfcac3a
48+
ca-central-1: ami-0afc2ea67b3963398
49+
eu-central-1: ami-0205eaef48a9fc97a
50+
eu-north-1: ami-0420576e18a5fcb7c
51+
eu-west-1: ami-0f67868de5be7b0b3
52+
eu-west-2: ami-057fa1a5314e3c414
53+
eu-west-3: ami-05b2808c2dc4fb82c
54+
sa-east-1: ami-0da1262e3c5d9af72
55+
us-east-1: ami-031eb9c5390c0f8f6
56+
us-east-2: ami-0050bd80a1cecfe37
57+
us-west-1: ami-09bd008b253048b80
58+
us-west-2: ami-003da28849bc413f5
5959
# ubuntu1404
60-
ap-northeast-1: ami-0ce1c5516c087ef8d
61-
ap-northeast-2: ami-0744c53e9582abcd4
62-
ap-northeast-3: ami-0d0faa548bcca5fac
63-
ap-south-1: ami-00721e9f7f8235dba
64-
ap-southeast-1: ami-03df9d0a89a448c63
65-
ap-southeast-2: ami-06116e2159f6ba6bf
66-
ca-central-1: ami-0d180013cf3d07fc9
67-
cn-north-1: ami-0ef85bbc4ba66c301
68-
eu-central-1: ami-04b116ae9a44c861f
69-
eu-north-1: ami-0de1c666987bbdb1f
70-
eu-west-1: ami-01b114f6a268d6a42
71-
eu-west-2: ami-0f9ad3c001b80325a
72-
eu-west-3: ami-0f921986737ab8306
73-
sa-east-1: ami-0d1d30ad051235185
74-
us-east-1: ami-0422aa8ec2e452870
75-
us-east-2: ami-02447e477105886bd
76-
us-gov-east-1: ami-03538e53996b83762
77-
us-gov-west-1: ami-90a2d5f1
78-
us-west-1: ami-0f4a99f972b9b4882
79-
us-west-2: ami-04caeb57df33aba89
60+
ap-northeast-1: ami-0939e3e1030d4f7d2
61+
ap-northeast-2: ami-0481c6b023e2328b4
62+
ap-northeast-3: ami-0a535e1d0bb7bc502
63+
ap-south-1: ami-000e99acc047832ae
64+
ap-southeast-1: ami-09ca9a6a8fee71ba5
65+
ap-southeast-2: ami-09646cc49a932a37e
66+
ca-central-1: ami-06ac5db73837bc364
67+
cn-north-1: ami-07e16a5709c99f963
68+
cn-northwest-1: ami-05348579489ba3673
69+
eu-central-1: ami-0032889c720d364dc
70+
eu-north-1: ami-0976908358f0bfa01
71+
eu-west-1: ami-0f5c65a609ad3afb4
72+
eu-west-2: ami-08c2d96c2805037e7
73+
eu-west-3: ami-0f6cd6ac9be8f2b32
74+
sa-east-1: ami-0d0da341da4802af9
75+
us-east-1: ami-017bfe181606779d8
76+
us-east-2: ami-043eb896e1bb2b948
77+
us-gov-east-1: ami-060ced48ab370aadf
78+
us-gov-west-1: ami-32f98153
79+
us-west-1: ami-0d48f8a9d5735efde
80+
us-west-2: ami-0169da6ccb6347f50
8081
# ubuntu1604
81-
ap-northeast-1: ami-041f6050eff86f024
82-
ap-northeast-2: ami-0df4c1dafbfee5031
83-
ap-northeast-3: ami-08d3ef362e1d06e56
84-
ap-south-1: ami-0ef148f6ae69767d7
85-
ap-southeast-1: ami-0b63a13236ce5b8d9
86-
ap-southeast-2: ami-0f5a3072f23556b07
87-
ca-central-1: ami-0c88262f6fd2738fc
88-
cn-north-1: ami-017ea2a40c48f9af4
89-
eu-central-1: ami-06a21b6e0815065a4
90-
eu-north-1: ami-0418320f06192d788
91-
eu-west-1: ami-0809bc00666e41cfa
92-
eu-west-2: ami-04d8578267aaa2ac4
93-
eu-west-3: ami-02de781189ccb9f92
94-
sa-east-1: ami-088d6a838e8dc6b11
95-
us-east-1: ami-0a8c4ea1bd1ff7651
96-
us-east-2: ami-04d5c390495e0509f
97-
us-gov-east-1: ami-0bfb76fbbbb68030d
98-
us-gov-west-1: ami-eeaed98f
99-
us-west-1: ami-0a33d79d5f920cc2c
100-
us-west-2: ami-00050b3048393bc12
82+
ap-northeast-1: ami-06b328a6ee03ccdf4
83+
ap-northeast-2: ami-0179e2707f709f813
84+
ap-northeast-3: ami-0c9b72bae5efc9f61
85+
ap-south-1: ami-0f21d1eb3339ebd6a
86+
ap-southeast-1: ami-01899e9a659eb2267
87+
ap-southeast-2: ami-049c81a79d55b2c8a
88+
ca-central-1: ami-0b8928a1f643684eb
89+
cn-north-1: ami-0ae967dc97d5eb57a
90+
cn-northwest-1: ami-0ba0b1ed49ce7b1b1
91+
eu-central-1: ami-002422c65a5bb1af8
92+
eu-north-1: ami-0d3c7ce730c73ab00
93+
eu-west-1: ami-00328873639859269
94+
eu-west-2: ami-0c1de72c6acf4b187
95+
eu-west-3: ami-090d577bb6d08e95b
96+
sa-east-1: ami-08df8912b098a3f42
97+
us-east-1: ami-08e1d33a6a64499de
98+
us-east-2: ami-0219fdb6f47395d88
99+
us-gov-east-1: ami-0af2c8e5bf3c334b0
100+
us-gov-west-1: ami-7b85fd1a
101+
us-west-1: ami-066818f6a6be06fb5
102+
us-west-2: ami-07122cb5a96b7fee9

0 commit comments

Comments
 (0)