Skip to content

Commit b1c5f57

Browse files
committed
tutorial: flux on aws
This set of configs allows for deploying Flux Framework to bare metal VMs on AWS EC2 using packer to build and terraform to deploy. The video for the tutorial is undergoing review and will be posted when that is finished. Signed-off-by: vsoch <[email protected]>
1 parent 7644316 commit b1c5f57

28 files changed

+3532
-3
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
# Flux Tutorials
22

3-
> A Dinosaur Tutorial Series!
3+
> A Dinosaur Tutorial Series!
44
55
## Tutorials
66

77
- [flux-in-slurm](tutorial/flux-in-slurm): Bring up a Flux instance (in user-space) in a Slurm Allocation - both in Kubernetes ([video](https://youtu.be/8ZkSLV0m7To?si=WqWKCe2jvRuTXvlJ))
8+
- [Flux on AWS](tutorial/aws): Deploy an entire Flux Framework cluster to "bare metal" instances on AWS with (essentially) two `make` commands - one to build with packer, and one to deploy with Terraform.
89
- [HPCIC Tutorial 2024](https://youtu.be/Dt4CSZWSEJE?si=b2O7lQrJixcKh-EJ)
910

1011
## What is this?

tutorial/aws/.gitignore

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
.idea/
2+
*.iml
3+
**/target/
4+
**/bin
5+
*.zip
6+
7+
# Terraform
8+
**/.terraform
9+
**/.terraform*
10+
**/terraform.tfplan
11+
**/terraform.tfstate
12+
**/terraform.tfstate.backup
13+
**/terraform.tfstate.d
14+
**/*.auto.tfvars
15+
**/.terraform.lock.hcl
16+
17+
# Temporary SSH keys
18+
**/id_rsa
19+
**/id_rsa.pub
20+
*.pem

tutorial/aws/README.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Flux on AWS
2+
3+
These Terraform recipes make it easy to deploy an entire cluster with Flux Framework on AWS! We provide recipes with [packer](https://developer.hashicorp.com/packer/install) to build base images for Flux, and the Terraform configuration files to deploy.
4+
5+
## Usage
6+
7+
First, choose a subdirectory that corresponds to the instance type you are interested in.
8+
9+
### 1. Build Images
10+
11+
Within the subdirectory, you likely want to build your images first. This will use [packer](https://developer.hashicorp.com/packer/install), so you should install it first. You can export your AWS credentials to the environment, but I prefer to use long term credentials, as [described here](https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-files.html). Then, saying we want to build the hpc6a:
12+
13+
```bash
14+
cd tf-hpc6a/build
15+
make
16+
```
17+
18+
You can also look in the makefile to see the respective commands
19+
20+
```bash
21+
packer init .
22+
packer fmt .
23+
packer validate .
24+
packer build flux-build.pkr.hcl
25+
```
26+
27+
The build logic is in the corresponding `build.sh` script, so if you want to add additional stuff (adding an application or other library install) write to the end of that file! Note that during the build you will see blocks of red and green. Red does *not* neccesarily indicate an error. But if you do run into one that stops the build, please [open an issue](https://github.com/converged-computing/flux-tutorials/issues) to ask for help. When the build is complete it will generate what is called an AMI, an "Amazon
28+
Machine Image" that you can use in the next step.
29+
30+
### 2. Terraform Recipe
31+
32+
We next want to update our terraform recipe, which is the `main.tf` file in each respective subdirectory.
33+
The build step should provide an ami, and you will want to put that into the locals.ami field:
34+
35+
```hcl
36+
locals {
37+
name = "flux"
38+
pwd = basename(path.cwd)
39+
region = "us-east-2"
40+
# Here!
41+
ami = "ami-0ce1a562c586219e6"
42+
placement = "eks-efa-testing"
43+
...
44+
}
45+
```
46+
47+
For better networking you'll want to make a placement group (the last field shown above), which you can do in the web interface or just:
48+
49+
```bash
50+
aws ec2 create-placement-group --group-name eks-efa-testing --strategy cluster
51+
```
52+
53+
### 3. Deploy with Terraform
54+
55+
Then you can just cd to where the `main.tf` is (e.g., for tf-hpc6a) and:
56+
57+
```bash
58+
cd tf-hpc6a
59+
make
60+
```
61+
62+
You can then shell into any node, and check the status of Flux. I usually grab the instance
63+
name via "Connect" in the portal, but you could likely use the AWS client for this too.
64+
65+
```bash
66+
$ ssh -o 'IdentitiesOnly yes' -i "mykey.pem" [email protected]
67+
```
68+
69+
### 4. Check Flux
70+
71+
Check the cluster status, the overlay status, and try running a job:
72+
73+
```bash
74+
$ flux resource list
75+
STATE NNODES NCORES NGPUS NODELIST
76+
free 2 192 0 i-0c13eb61596ffd5c6,i-0f4fe028d6c3036c0
77+
allocated 0 0 0
78+
down 0 0 0
79+
```
80+
```bash
81+
$ flux run -N 2 hostname
82+
i-0c13eb61596ffd5c6
83+
i-0f4fe028d6c3036c0
84+
```
85+
86+
You can look at the startup script logs like this if you need to debug.
87+
88+
```bash
89+
$ cat /var/log/cloud-init-output.log
90+
```

tutorial/aws/tf-hpc6a/Makefile

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
.PHONY: all
2+
all: init fmt validate build
3+
4+
.PHONY: init
5+
init:
6+
terraform init
7+
8+
.PHONY: fmt
9+
fmt:
10+
terraform fmt
11+
12+
.PHONY: validate
13+
validate:
14+
terraform validate
15+
16+
.PHONY: build
17+
build:
18+
terraform apply
19+
20+
.PHONY: destroy
21+
destroy:
22+
terraform destroy
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
.PHONY: all
2+
all: init fmt validate build
3+
4+
.PHONY: init
5+
init:
6+
packer init .
7+
8+
.PHONY: fmt
9+
fmt:
10+
packer fmt .
11+
12+
.PHONY: validate
13+
validate:
14+
packer validate .
15+
16+
.PHONY: build
17+
build:
18+
packer build flux-build.pkr.hcl
Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
#!/bin/bash
2+
3+
set -euo pipefail
4+
5+
################################################################
6+
#
7+
# Flux, Singularity, and EFA
8+
#
9+
10+
/usr/bin/cloud-init status --wait
11+
12+
export DEBIAN_FRONTEND=noninteractive
13+
sudo apt-get update && \
14+
sudo apt-get install -y apt-transport-https ca-certificates curl jq apt-utils wget \
15+
libelf-dev libpcap-dev libbfd-dev binutils-dev build-essential make \
16+
linux-tools-common linux-tools-$(uname -r) \
17+
python3-pip git net-tools
18+
19+
# cmake is needed for flux-sched, and make sure to choose arm or x86
20+
export CMAKE=3.23.1
21+
export ARCH=x86_64
22+
export ORAS_ARCH=amd64
23+
24+
curl -s -L https://github.com/Kitware/CMake/releases/download/v$CMAKE/cmake-$CMAKE-linux-$ARCH.sh > cmake.sh && \
25+
sudo sh cmake.sh --prefix=/usr/local --skip-license && \
26+
sudo apt-get install -y man flex ssh sudo vim luarocks munge lcov ccache lua5.4 \
27+
valgrind build-essential pkg-config autotools-dev libtool \
28+
libffi-dev autoconf automake make clang clang-tidy \
29+
gcc g++ libpam-dev apt-utils lua-posix \
30+
libsodium-dev libzmq3-dev libczmq-dev libjansson-dev libmunge-dev \
31+
libncursesw5-dev liblua5.4-dev liblz4-dev libsqlite3-dev uuid-dev \
32+
libhwloc-dev libs3-dev libevent-dev libarchive-dev \
33+
libboost-graph-dev libboost-system-dev libboost-filesystem-dev \
34+
libboost-regex-dev libyaml-cpp-dev libedit-dev uidmap dbus-user-session python3-cffi
35+
36+
# Prepare lua rocks (does it really rock?)
37+
sudo locale-gen en_US.UTF-8
38+
39+
# This is needed if you intend to use EFA (HPC instance type)
40+
# Install EFA alone without AWS OPEN_MPI
41+
# At the time of running this, latest was 1.32.0
42+
export EFA_VERSION=latest
43+
mkdir /tmp/efa
44+
cd /tmp/efa
45+
curl -O https://s3-us-west-2.amazonaws.com/aws-efa-installer/aws-efa-installer-${EFA_VERSION}.tar.gz
46+
tar -xf aws-efa-installer-${EFA_VERSION}.tar.gz
47+
cd aws-efa-installer
48+
sudo ./efa_installer.sh -y
49+
50+
# - /var/lib/dkms/efa/2.10.0/6.5.0-1022-aws/aarch64/module/efa.ko
51+
# Processing triggers for man-db (2.10.2-1) ...
52+
# Processing triggers for libc-bin (2.35-0ubuntu3.8) ...
53+
# NEEDRESTART-VER: 3.5
54+
# NEEDRESTART-KCUR: 6.5.0-1022-aws
55+
# NEEDRESTART-KEXP: 6.5.0-1022-aws
56+
# NEEDRESTART-KSTA: 1
57+
# NEEDRESTART-SVC: dbus.service
58+
# NEEDRESTART-SVC: networkd-dispatcher.service
59+
# NEEDRESTART-SVC: systemd-logind.service
60+
# NEEDRESTART-SVC: unattended-upgrades.service
61+
# NEEDRESTART-SVC: [email protected]
62+
# Updating boot ramdisk
63+
# update-initramfs: Generating /boot/initrd.img-6.5.0-1022-aws
64+
# System running in EFI mode, skipping.
65+
# libfabric1-aws is verified to install /opt/amazon/efa/lib/libfabric.so
66+
# openmpi40-aws is verified to install /opt/amazon/openmpi/lib/libmpi.so
67+
# openmpi50-aws is verified to install /opt/amazon/openmpi5/lib/libmpi.so
68+
# efa-profile is verified to install /etc/ld.so.conf.d/000_efa.conf
69+
# efa-profile is verified to install /etc/profile.d/zippy_efa.sh
70+
# Reloading EFA kernel module
71+
# EFA device not detected, skipping test.
72+
# ===================================================
73+
# EFA installation complete.
74+
# - Please logout/login to complete the installation.
75+
# - Libfabric was installed in /opt/amazon/efa
76+
# - Open MPI 4 was installed in /opt/amazon/openmpi
77+
# - Open MPI 5 was installed in /opt/amazon/openmpi5
78+
# ===================================================
79+
# fi_info -p efa -t FI_EP_RDM
80+
# Disable ptrace
81+
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html
82+
sudo sysctl -w kernel.yama.ptrace_scope=0
83+
84+
################################################################
85+
## Install Flux and dependencies
86+
#
87+
sudo chown -R $USER /opt && \
88+
mkdir -p /opt/prrte && \
89+
cd /opt/prrte && \
90+
git clone https://github.com/openpmix/openpmix.git && \
91+
git clone https://github.com/openpmix/prrte.git && \
92+
cd openpmix && \
93+
git checkout fefaed568f33bf86f28afb6e45237f1ec5e4de93 && \
94+
./autogen.pl && \
95+
./configure --prefix=/usr --disable-static && sudo make install && \
96+
sudo ldconfig
97+
98+
99+
# prrte you are sure looking perrrty today
100+
cd /opt/prrte/prrte && \
101+
git checkout 477894f4720d822b15cab56eee7665107832921c && \
102+
./autogen.pl && \
103+
./configure --prefix=/usr && sudo make -j install
104+
105+
# flux security
106+
wget https://github.com/flux-framework/flux-security/releases/download/v0.11.0/flux-security-0.11.0.tar.gz && \
107+
tar -xzvf flux-security-0.11.0.tar.gz && \
108+
mv flux-security-0.11.0 /opt/flux-security && \
109+
cd /opt/flux-security && \
110+
./configure --prefix=/usr --sysconfdir=/etc && \
111+
make -j && sudo make install
112+
113+
# The VMs will share the same munge key
114+
sudo mkdir -p /var/run/munge && \
115+
dd if=/dev/urandom bs=1 count=1024 > munge.key && \
116+
sudo mv munge.key /etc/munge/munge.key && \
117+
sudo chown -R munge /etc/munge/munge.key /var/run/munge && \
118+
sudo chmod 600 /etc/munge/munge.key
119+
120+
# Make the flux run directory
121+
mkdir -p /home/ubuntu/run/flux
122+
123+
# Flux core
124+
wget https://github.com/flux-framework/flux-core/releases/download/v0.66.0/flux-core-0.66.0.tar.gz && \
125+
tar -xzvf flux-core-0.66.0.tar.gz && \
126+
mv flux-core-0.66.0 /opt/flux-core && \
127+
cd /opt/flux-core && \
128+
./configure --prefix=/usr --sysconfdir=/etc --runstatedir=/home/flux/run --with-flux-security && \
129+
make clean && \
130+
make -j && sudo make install
131+
132+
# Flux pmix (must be installed after flux core)
133+
wget https://github.com/flux-framework/flux-pmix/releases/download/v0.5.0/flux-pmix-0.5.0.tar.gz && \
134+
tar -xzvf flux-pmix-0.5.0.tar.gz && \
135+
mv flux-pmix-0.5.0 /opt/flux-pmix && \
136+
cd /opt/flux-pmix && \
137+
./configure --prefix=/usr && \
138+
make -j && \
139+
sudo make install
140+
141+
# Flux sched
142+
wget https://github.com/flux-framework/flux-sched/releases/download/v0.37.0/flux-sched-0.37.0.tar.gz && \
143+
tar -xzvf flux-sched-0.37.0.tar.gz && \
144+
mv flux-sched-0.37.0 /opt/flux-sched && \
145+
cd /opt/flux-sched && \
146+
mkdir build && \
147+
cd build && \
148+
cmake ../ && make -j && sudo make install && sudo ldconfig && \
149+
echo "DONE flux build"
150+
151+
# Flux curve.cert
152+
# Ensure we have a shared curve certificate
153+
flux keygen /tmp/curve.cert && \
154+
sudo mkdir -p /etc/flux/system && \
155+
sudo cp /tmp/curve.cert /etc/flux/system/curve.cert && \
156+
sudo chown ubuntu /etc/flux/system/curve.cert && \
157+
sudo chmod o-r /etc/flux/system/curve.cert && \
158+
sudo chmod g-r /etc/flux/system/curve.cert && \
159+
# Permissions for imp
160+
sudo chmod u+s /usr/libexec/flux/flux-imp && \
161+
sudo chmod 4755 /usr/libexec/flux/flux-imp && \
162+
# /var/lib/flux needs to be owned by the instance owner
163+
sudo mkdir -p /var/lib/flux && \
164+
sudo chown $USER -R /var/lib/flux && \
165+
# clean up (and make space)
166+
cd /opt
167+
sudo rm -rf /opt/flux-core /opt/flux-sched /opt/prrte /opt/flux-security /opt/flux-pmix
168+
169+
# Install oras and singularity
170+
export VERSION="1.1.0" && \
171+
curl -LO "https://github.com/oras-project/oras/releases/download/v${VERSION}/oras_${VERSION}_linux_${ORAS_ARCH}.tar.gz" && \
172+
mkdir -p oras-install/ && \
173+
tar -zxf oras_${VERSION}_*.tar.gz -C oras-install/ && \
174+
sudo mv oras-install/oras /usr/local/bin/ && \
175+
rm -rf oras_${VERSION}_*.tar.gz oras-install/
176+
177+
cd /opt
178+
179+
# flux start mpirun -n 6 singularity exec singularity-mpi_mpich.sif /opt/mpitest
180+
sudo apt-get update && sudo apt-get install -y libseccomp-dev libglib2.0-dev cryptsetup \
181+
libfuse-dev \
182+
squashfs-tools \
183+
squashfs-tools-ng \
184+
uidmap \
185+
zlib1g-dev \
186+
iperf3
187+
188+
sudo apt-get install -y \
189+
autoconf \
190+
automake \
191+
cryptsetup \
192+
git \
193+
libfuse-dev \
194+
libglib2.0-dev \
195+
libseccomp-dev \
196+
libtool \
197+
pkg-config \
198+
runc \
199+
squashfs-tools \
200+
squashfs-tools-ng \
201+
uidmap \
202+
wget \
203+
zlib1g-dev
204+
205+
# install go
206+
wget https://go.dev/dl/go1.21.0.linux-${ORAS_ARCH}.tar.gz
207+
tar -xvf go1.21.0.linux-${ORAS_ARCH}.tar.gz
208+
sudo mv go /usr/local && rm go1.21.0.linux-${ORAS_ARCH}.tar.gz
209+
export PATH=/usr/local/go/bin:$PATH
210+
211+
# Install singularity
212+
export VERSION=4.0.1 && \
213+
wget https://github.com/sylabs/singularity/releases/download/v${VERSION}/singularity-ce-${VERSION}.tar.gz && \
214+
tar -xzf singularity-ce-${VERSION}.tar.gz && \
215+
cd singularity-ce-${VERSION}
216+
217+
./mconfig && \
218+
make -C builddir && \
219+
sudo make -C builddir install
220+
221+
#
222+
# At this point we have what we need!

0 commit comments

Comments
 (0)