Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
e422820
add storage VLAN interface on all slurm nodes
May 15, 2024
0572a57
configure storage network components based on `cluster_storage_networ…
May 28, 2024
40bf3e6
remove login sec group from storage port
May 28, 2024
284d23b
remove comments, storage network renamed
May 28, 2024
87286e2
consistent comments
May 28, 2024
08efaab
cluster_vnic_profile removed as no longer used
May 28, 2024
4309484
Image update - OpenHPC v3.1 for RL9 (#394)
sjpb Jun 6, 2024
453b1e6
Support ceph quincy for RL9 (#397)
sjpb Jun 6, 2024
24c6b05
Disable grafana repos by default (#399)
sjpb Jun 19, 2024
9d1ae1a
Add squid role (#401)
sjpb Jun 27, 2024
18faae4
Upgrade ssh from SIG/security to fix CVE-2024-6387 (#404)
sjpb Jul 2, 2024
fcf4648
fix squid port default (#405)
sjpb Jul 4, 2024
8623b15
allow extending fat images with site-specific groups (#403)
sjpb Jul 5, 2024
45e5173
remove squid nodes from podman group - is not containerised (#407)
sjpb Jul 5, 2024
c410634
fix README for RL9 (#408)
sjpb Jul 5, 2024
7e8dab6
add groups support to basic_users (#406)
sjpb Jul 5, 2024
c1cab49
Revert to base ssh repos (#410)
sjpb Jul 16, 2024
e44d704
Add TuneD (#409)
bertiethorpe Jul 18, 2024
5504fa3
Use shorter names for CI clusters (#415)
sjpb Jul 23, 2024
0a5f62c
install ood apps in fatimage
bertiethorpe Jul 19, 2024
5da7f4f
add ood jupyter install to fatimage
bertiethorpe Jul 22, 2024
2c87644
jupyter_compute ood into fatimage
bertiethorpe Jul 22, 2024
49182d7
bump fatimage
bertiethorpe Jul 23, 2024
99c52ed
allow items in compute mapping to have different keys e.g. only speci…
sjpb Jul 23, 2024
df8dd0c
Support ansible-init for remote collections (#411)
sjpb Aug 7, 2024
b4a47ec
avoid python-openstackclient v7 due to rebuild bug (#420)
sjpb Aug 7, 2024
9c6efa1
Update hpctests to obey UCX_NET_DEVICES when RoCE devices present (#421)
bertiethorpe Aug 7, 2024
1aff0c3
Update OSes available for deployment (#424)
bertiethorpe Aug 14, 2024
c2d796c
Correct the -only options in the Packer README (#423)
MoteHue Aug 14, 2024
09bcb71
Add trivy image scanning (#413)
sjpb Aug 14, 2024
ccdf036
enable 'openstack baremetal ...' commands (#425)
sjpb Aug 15, 2024
25533b6
check for upstream changes
bertiethorpe Aug 20, 2024
d765077
update README.md
bertiethorpe Aug 20, 2024
e843119
update README.md
bertiethorpe Aug 20, 2024
30bcb4d
update README.md
bertiethorpe Aug 20, 2024
eb9b6e7
update documentation
bertiethorpe Aug 21, 2024
ad84245
test upload images commit
bertiethorpe Aug 21, 2024
8790e12
just use workflow_dispatch
bertiethorpe Aug 22, 2024
c2b87e4
fix cloud config parse
bertiethorpe Aug 22, 2024
377a607
fix image create name
bertiethorpe Aug 22, 2024
1866fc8
pick bucket and handle cancellation
bertiethorpe Aug 22, 2024
3d0dde7
documentation
bertiethorpe Aug 22, 2024
b23b0cd
suggested changes
bertiethorpe Aug 23, 2024
e260818
filter out active images
bertiethorpe Aug 23, 2024
37e7240
fix image exists
bertiethorpe Aug 23, 2024
7f0036e
fix image test logic
bertiethorpe Aug 23, 2024
57e1e49
add quotes to var
bertiethorpe Aug 23, 2024
0a8f9ed
finalise for upstream
bertiethorpe Aug 23, 2024
a45a615
markdown test
bertiethorpe Aug 23, 2024
86fb48d
markdown block text
bertiethorpe Aug 23, 2024
663e6cb
finish
bertiethorpe Aug 23, 2024
9e53ce6
Add RL9 cuda build variant (#428)
sjpb Sep 6, 2024
80c4ceb
Build RL8+OFED image in CI (#427)
MoteHue Sep 6, 2024
554f16f
Create extract_logs.py
bertiethorpe Sep 9, 2024
756a1fa
Update extract_logs.py
bertiethorpe Sep 9, 2024
2932a9d
Ignore irrelevant paths in workflow trigger
sd109 Sep 9, 2024
cfee7b6
Update extract_logs.py
bertiethorpe Sep 16, 2024
9bdc696
Update extract_logs.py
bertiethorpe Sep 16, 2024
9728489
Update stackhpc.yml
bertiethorpe Sep 16, 2024
dd7bec3
Update stackhpc.yml
bertiethorpe Sep 16, 2024
1c78e5b
Update stackhpc.yml
bertiethorpe Sep 16, 2024
cabdd99
Enable SMS Labs for CI (#426)
bertiethorpe Sep 17, 2024
db84ea8
Caas updated to use openstack_networking_floatingip_associate_v2 (#445)
JohnGarbutt Oct 1, 2024
db2ce09
Fix up the outputs, after the fip fix (#446)
JohnGarbutt Oct 1, 2024
9c31164
Add description of image to build (#444)
sjpb Oct 4, 2024
760ab20
Nightly Slurm CI Rocky update workflow (#440)
bertiethorpe Oct 10, 2024
368436e
test s3 image sync
bertiethorpe Oct 10, 2024
0c396e8
fix s3cfg creds
bertiethorpe Oct 10, 2024
95f043b
fix ~/.s3cfg
bertiethorpe Oct 10, 2024
e2f30d4
revert to using secret
bertiethorpe Oct 10, 2024
bd4dcc8
multipart chunk image upload
bertiethorpe Oct 10, 2024
dec3968
cleanup s3 at beginning
bertiethorpe Oct 11, 2024
e4d90b7
move s3 sync to new workflow
bertiethorpe Oct 11, 2024
4f99313
update packer readme
bertiethorpe Oct 14, 2024
62a5906
Apply suggestions from code review
bertiethorpe Oct 14, 2024
bf57939
set matrix exclusion dynamically
bertiethorpe Oct 14, 2024
52367cc
Update docs to include operations (#422)
sjpb Oct 15, 2024
3f85f77
Ansible playbook to configure sshd for Conch CA certs.
Oct 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions .github/bin/create-merge-branch.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#!/usr/bin/env bash

#####
# This script creates a branch that merges the latest release
#####

set -ex

# Only allow running on main
CURRENT_BRANCH="$(git branch --show-current)"
if [ "$CURRENT_BRANCH" != "main" ]; then
echo "[ERROR] This script can only be run on the main branch" >&2
exit 1
fi

if [ -n "$(git status --short)" ]; then
echo "[ERROR] This script cannot run with uncommitted changes" >&2
exit 1
fi

UPSTREAM_REPO="${UPSTREAM_REPO:-"stackhpc/ansible-slurm-appliance"}"
echo "[INFO] Using upstream repo - $UPSTREAM_REPO"

# Fetch the tag for the latest release from the upstream repository
RELEASE_TAG="$(curl -fsSL "https://api.github.com/repos/${UPSTREAM_REPO}/releases/latest" | jq -r '.tag_name')"
echo "[INFO] Found latest release tag - $RELEASE_TAG"

# Add the repository as an upstream
echo "[INFO] Adding upstream remote..."
git remote add upstream "https://github.com/${UPSTREAM_REPO}.git"
git remote show upstream

echo "[INFO] Fetching remote tags..."
git remote update

# Use a branch that is named for the release
BRANCH_NAME="upgrade/$RELEASE_TAG"

# Check if the branch already exists on the origin
# If it does, there is nothing more to do as the branch can be rebased from the MR
if git show-branch "remotes/origin/$BRANCH_NAME" >/dev/null 2>&1; then
echo "[INFO] Merge branch already created for $RELEASE_TAG"
exit
fi

echo "[INFO] Merging release tag - $RELEASE_TAG"
git merge --strategy recursive -X theirs --no-commit $RELEASE_TAG

# Check if the merge resulted in any changes being staged
if [ -n "$(git status --short)" ]; then
echo "[INFO] Merge resulted in the following changes"
git status

# NOTE(scott): The GitHub create-pull-request action does
# the commiting for us, so we only need to make branches
# and commits if running outside of GitHub actions.
if [ ! $GITHUB_ACTIONS ]; then
echo "[INFO] Checking out temporary branch '$BRANCH_NAME'..."
git checkout -b "$BRANCH_NAME"

echo "[INFO] Committing changes"
git commit -m "Upgrade ansible-slurm-applaince to $RELEASE_TAG"

echo "[INFO] Pushing changes to origin"
git push --set-upstream origin "$BRANCH_NAME"

# Go back to the main branch at the end
echo "[INFO] Reverting back to main"
git checkout main

echo "[INFO] Removing temporary branch"
git branch -d "$BRANCH_NAME"
fi

# Write a file containing the branch name and tag
# for automatic PR or MR creation that follows
echo "BRANCH_NAME=\"$BRANCH_NAME\"" > .mergeenv
echo "RELEASE_TAG=\"$RELEASE_TAG\"" >> .mergeenv
else
echo "[INFO] Merge resulted in no changes"
fi
26 changes: 26 additions & 0 deletions .github/bin/get-s3-image.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

#####
# This script looks for an image in OpenStack and if not found, downloads from
# S3 bucket, and then uploads to OpenStack
#####

set -ex

image_name=$1
bucket_name=$2
echo "Checking if image $image_name exists in OpenStack"
image_exists=$(openstack image list --name "$image_name" -f value -c Name)

if [ -n "$image_exists" ]; then
echo "Image $image_name already exists in OpenStack."
else
echo "Image $image_name not found in OpenStack. Getting it from S3."

wget https://object.arcus.openstack.hpc.cam.ac.uk/swift/v1/AUTH_3a06571936a0424bb40bc5c672c4ccb1/$bucket_name/$image_name --progress=dot:giga

echo "Uploading image $image_name to OpenStack..."
openstack image create --file $image_name --disk-format qcow2 $image_name --progress

echo "Image $image_name has been uploaded to OpenStack."
fi
99 changes: 70 additions & 29 deletions .github/workflows/fatimage.yml
Original file line number Diff line number Diff line change
@@ -1,79 +1,120 @@

name: Build fat image
'on':
on:
workflow_dispatch:
inputs:
use_RL8:
required: true
description: Include RL8 image build
type: boolean
default: false
inputs:
ci_cloud:
description: 'Select the CI_CLOUD'
required: true
type: choice
options:
- LEAFCLOUD
- SMS
- ARCUS

jobs:
openstack:
name: openstack-imagebuild
runs-on: ubuntu-20.04
concurrency: ${{ github.ref }}-{{ matrix.os_version }} # to branch/PR + OS
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.os_version }}-${{ matrix.build }} # to branch/PR + OS + build
cancel-in-progress: true
runs-on: ubuntu-22.04
strategy:
matrix:
os_version: [RL8, RL9]
rl8_selected:
- ${{ inputs.use_RL8 == true }} # only potentially true for workflow_dispatch
fail-fast: false # allow other matrix jobs to continue even if one fails
matrix: # build RL8+OFED, RL9+OFED, RL9+OFED+CUDA versions
os_version:
- RL8
- RL9
build:
- openstack.openhpc
- openstack.openhpc-cuda
exclude:
- os_version: RL8
rl8_selected: false
build: openstack.openhpc-cuda
env:
ANSIBLE_FORCE_COLOR: True
OS_CLOUD: openstack
CI_CLOUD: ${{ vars.CI_CLOUD }}
CI_CLOUD: ${{ github.event.inputs.ci_cloud }}
SOURCE_IMAGES_MAP: |
{
"RL8": {
"openstack.openhpc": "rocky-latest-RL8",
"openstack.openhpc-cuda": "rocky-latest-cuda-RL8"
},
"RL9": {
"openstack.openhpc": "rocky-latest-RL9",
"openstack.openhpc-cuda": "rocky-latest-cuda-RL9"
}
}

steps:
- uses: actions/checkout@v2

- name: Record settings for CI cloud
run: |
echo CI_CLOUD: ${{ env.CI_CLOUD }}

- name: Setup ssh
run: |
set -x
mkdir ~/.ssh
echo "${{ secrets[format('{0}_SSH_KEY', vars.CI_CLOUD)] }}" > ~/.ssh/id_rsa
echo "${{ secrets[format('{0}_SSH_KEY', env.CI_CLOUD)] }}" > ~/.ssh/id_rsa
chmod 0600 ~/.ssh/id_rsa
shell: bash

- name: Add bastion's ssh key to known_hosts
run: cat environments/.stackhpc/bastion_fingerprints >> ~/.ssh/known_hosts
shell: bash

- name: Install ansible etc
run: dev/setup-env.sh

- name: Write clouds.yaml
run: |
mkdir -p ~/.config/openstack/
echo "${{ secrets[format('{0}_CLOUDS_YAML', vars.CI_CLOUD)] }}" > ~/.config/openstack/clouds.yaml
echo "${{ secrets[format('{0}_CLOUDS_YAML', env.CI_CLOUD)] }}" > ~/.config/openstack/clouds.yaml
shell: bash

- name: Setup environment
run: |
. venv/bin/activate
. environments/.stackhpc/activate

- name: Build fat image with packer
id: packer_build
run: |
set -x
. venv/bin/activate
. environments/.stackhpc/activate
cd packer/
packer init .
PACKER_LOG=1 packer build -on-error=${{ vars.PACKER_ON_ERROR }} -var-file=$PKR_VAR_environment_root/${{ vars.CI_CLOUD }}.pkrvars.hcl openstack.pkr.hcl

PACKER_LOG=1 packer build \
-on-error=${{ vars.PACKER_ON_ERROR }} \
-only=${{ matrix.build }} \
-var-file=$PKR_VAR_environment_root/${{ env.CI_CLOUD }}.pkrvars.hcl \
-var "source_image_name=${{ env.SOURCE_IMAGE }}" \
openstack.pkr.hcl
env:
PKR_VAR_os_version: ${{ matrix.os_version }}
SOURCE_IMAGE: ${{ fromJSON(env.SOURCE_IMAGES_MAP)[matrix.os_version][matrix.build] }}

- name: Get created image names from manifest
id: manifest
run: |
. venv/bin/activate
for IMAGE_ID in $(jq --raw-output '.builds[].artifact_id' packer/packer-manifest.json)
do
while ! openstack image show -f value -c name $IMAGE_ID; do
sleep 5
done
IMAGE_NAME=$(openstack image show -f value -c name $IMAGE_ID)
echo $IMAGE_NAME
IMAGE_ID=$(jq --raw-output '.builds[-1].artifact_id' packer/packer-manifest.json)
while ! openstack image show -f value -c name $IMAGE_ID; do
sleep 5
done
IMAGE_NAME=$(openstack image show -f value -c name $IMAGE_ID)
echo $IMAGE_ID > image-id.txt
echo $IMAGE_NAME > image-name.txt

- name: Upload manifest artifact
uses: actions/upload-artifact@v4
with:
name: image-details-${{ matrix.build }}-${{ matrix.os_version }}
path: |
./image-id.txt
./image-name.txt
overwrite: true
Loading
Loading