Skip to content

Commit 6236614

Browse files
d4l3kfacebook-github-bot
authored andcommitted
ci/slurm: use ec2 instance connect + mssh instead of using SSH keys (#265)
Summary: This switches the integration tests to use ec2 instance connect w/ an assumed role instead of embedding the slurm ssh key in GitHub secrets. Pull Request resolved: #265 Test Plan: ``` $ env SLURM_INSTANCE_MASTER=ubuntu@i-01dd4b95724eb0b4b scripts/slurmint.sh ``` CI Reviewed By: kiukchung, aivanou Differential Revision: D31695261 Pulled By: d4l3k fbshipit-source-id: 48a52e911e68bc9b18ed470a5f7e725ff58697b1
1 parent f00df91 commit 6236614

File tree

2 files changed

+24
-10
lines changed

2 files changed

+24
-10
lines changed

.github/workflows/slurm-integration-tests.yaml

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,9 @@ on:
99
jobs:
1010
slurm:
1111
runs-on: ubuntu-18.04
12+
permissions:
13+
id-token: write
14+
contents: read
1215
steps:
1316
- name: Setup Python
1417
uses: actions/setup-python@v2
@@ -17,21 +20,32 @@ jobs:
1720
architecture: x64
1821
- name: Checkout TorchX
1922
uses: actions/checkout@v2
23+
- name: Configure AWS
24+
env:
25+
AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
26+
run: |
27+
if [ -n "$AWS_ROLE_ARN" ]; then
28+
export AWS_WEB_IDENTITY_TOKEN_FILE=/tmp/awscreds
29+
export AWS_DEFAULT_REGION=us-west-2
30+
31+
echo AWS_WEB_IDENTITY_TOKEN_FILE=$AWS_WEB_IDENTITY_TOKEN_FILE >> $GITHUB_ENV
32+
echo AWS_ROLE_ARN=$AWS_ROLE_ARN >> $GITHUB_ENV
33+
echo AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION >> $GITHUB_ENV
34+
35+
curl -H "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" "$ACTIONS_ID_TOKEN_REQUEST_URL" | jq -r '.value' > $AWS_WEB_IDENTITY_TOKEN_FILE
36+
fi
2037
- name: Install Dependencies
2138
run:
2239
set -ex
2340

24-
pip install wheel
41+
pip install wheel ec2instanceconnectcli
2542
- name: Run Slurm Integration Tests
2643
env:
27-
SLURM_SSH: ${{ secrets.SLURM_SSH }}
28-
SLURM_MASTER: ${{ secrets.SLURM_MASTER }}
44+
SLURM_INSTANCE_MASTER: ${{ secrets.SLURM_INSTANCE_MASTER }}
2945
SLURM_KNOWN_HOST: ${{ secrets.SLURM_KNOWN_HOST }}
30-
SLURM_IDENT: id_rsa
3146
run: |
3247
set -e
33-
echo "$SLURM_SSH" > "$SLURM_IDENT"
34-
chmod 600 "$SLURM_IDENT"
48+
3549
mkdir -p ~/.ssh
3650
echo "$SLURM_KNOWN_HOST" >> ~/.ssh/known_hosts
3751

scripts/slurmint.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ python setup.py bdist_wheel
1414

1515
WHEEL="$DIST/$(ls $DIST)"
1616

17-
if [[ -z "${SLURM_MASTER}" ]]; then
18-
echo "slurm master is not set, skipping test..."
17+
if [[ -z "${SLURM_INSTANCE_MASTER}" ]]; then
18+
echo "SLURM_INSTANCE_MASTER is not set, skipping test..."
1919
exit 0
2020
fi
2121

@@ -25,11 +25,11 @@ VENV="$DIR/venv"
2525

2626
function run_cmd {
2727
# shellcheck disable=SC2048,SC2086
28-
ssh -o ServerAliveInterval=60 "$SLURM_MASTER" -i "$SLURM_IDENT" $*
28+
mssh -o ServerAliveInterval=60 "$SLURM_INSTANCE_MASTER" -- $*
2929
}
3030

3131
function run_scp {
32-
scp -i "$SLURM_IDENT" "$1" "$SLURM_MASTER:$2"
32+
rsync -rav -e mssh "$1" "$SLURM_INSTANCE_MASTER:$2"
3333
}
3434

3535
function cleanup {

0 commit comments

Comments
 (0)