Skip to content

Commit d3daba4

Browse files
committed
Merge image rebuild
2 parents 4c7f875 + 9cde995 commit d3daba4

27 files changed

+292
-106
lines changed

.github/workflows/publish-helm-chart.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,3 @@ jobs:
2424
token: ${{ secrets.GITHUB_TOKEN }}
2525
version: ${{ steps.semver.outputs.version }}
2626
app-version: ${{ steps.semver.outputs.short-sha }}
27-

README.md

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
# Slurm Docker Cluster
22

3-
This is a multi-container Slurm cluster using Kubernetes. The Helm chart
4-
creates a named volume for persistent storage of MySQL data files as well as
5-
an NFS volume for shared storage.
3+
This is a multi-container Slurm cluster using Kubernetes. The Slurm cluster Helm chart creates a named volume for persistent storage of MySQL data files. By default, it also installs the
4+
RookNFS Helm chart (also in this repo) to provide shared storage across the Slurm cluster nodes.
65

76
## Dependencies
87

@@ -27,47 +26,51 @@ The Helm chart will create the following named volumes:
2726

2827
* var_lib_mysql ( -> /var/lib/mysql )
2928

30-
A named ReadWriteMany (RWX) volume mounted to `/home` is also expected, this can be external or can be deployed using the scripts in the `/nfs` directory (See "Deploying the Cluster")
29+
A named ReadWriteMany (RWX) volume mounted to `/home` is also expected, this can be external or can be deployed using the provided `rooknfs` chart directory (See "Deploying the Cluster").
3130

3231
## Configuring the Cluster
3332

34-
All config files in `slurm-cluster-chart/files` will be mounted into the container to configure their respective services on startup. Note that changes to these files will not all be propagated to existing deployments (see "Reconfiguring the Cluster").
35-
Additional parameters can be found in the `values.yaml` file, which will be applied on a Helm chart deployment. Note that some of these values will also not propagate until the cluster is restarted (see "Reconfiguring the Cluster").
33+
All config files in `slurm-cluster-chart/files` will be mounted into the container to configure their respective services on startup. Note that changes to these files will not all be propagated to existing deployments (see "Reconfiguring the Cluster"). Additional parameters can be found in the `values.yaml` file for the Helm chart. Note that some of these values will also not propagate until the cluster is restarted (see "Reconfiguring the Cluster").
3634

3735
## Deploying the Cluster
3836

3937
### Generating Cluster Secrets
4038

4139
On initial deployment ONLY, run
4240
```console
43-
./generate-secrets.sh
41+
./generate-secrets.sh [<target-namespace>]
4442
```
45-
This generates a set of secrets. If these need to be regenerated, see "Reconfiguring the Cluster"
43+
This generates a set of secrets in the target namespace to be used by the Slurm cluster. If these need to be regenerated, see "Reconfiguring the Cluster"
4644

4745
Be sure to take note of the Open Ondemand credentials, you will need them to access the cluster through a browser
4846

4947
### Connecting RWX Volume
5048

51-
A ReadWriteMany (RWX) volume is required, if a named volume exists, set `nfs.claimName` in the `values.yaml` file to its name. If not, manifests to deploy a Rook NFS volume are provided in the `/nfs` directory. You can deploy this by running
52-
```console
53-
./nfs/deploy-nfs.sh
54-
```
55-
and leaving `nfs.claimName` as the provided value.
49+
A ReadWriteMany (RWX) volume is required for shared storage across cluster nodes. By default, the Rook NFS Helm chart is installed as a dependency of the Slurm cluster chart in order to provide a RWX capable Storage Class for the required shared volume. If the target Kubernetes cluster has an existing storage class which should be used instead, then `storageClass` in `values.yaml` should be set to the name of this existing class and the RookNFS dependency should be disabled by setting `rooknfs.enabled = false`. In either case, the storage capacity of the provisioned RWX volume can be configured by setting the value of `storage.capacity`.
50+
51+
See the separate RookNFS chart [values.yaml](./rooknfs/values.yaml) for further configuration options when using the RookNFS to provide the shared storage volume.
5652

5753
### Supplying Public Keys
5854

5955
To access the cluster via `ssh`, you will need to make your public keys available. All your public keys from localhost can be added by running
6056

6157
```console
62-
./publish-keys.sh
58+
./publish-keys.sh [<target-namespace>]
6359
```
60+
where `<target-namespace>` is the namespace in which the Slurm cluster chart will be deployed (i.e. using `helm install -n <target-namespace> ...`). This will create a Kubernetes Secret in the appropriate namespace for the Slurm cluster to use. Omitting the namespace arg will install the secrets in the default namespace.
6461

6562
### Deploying with Helm
6663

6764
After configuring `kubectl` with the appropriate `kubeconfig` file, deploy the cluster using the Helm chart:
6865
```console
6966
helm install <deployment-name> slurm-cluster-chart
7067
```
68+
69+
NOTE: If using the RookNFS dependency, then the following must be run before installing the Slurm cluster chart
70+
```console
71+
helm dependency update slurm-cluster-chart
72+
```
73+
7174
Subsequent releases can be deployed using:
7275

7376
```console
@@ -130,6 +133,7 @@ srun singularity exec docker://ghcr.io/stackhpc/mpitests-container:${MPI_CONTAIN
130133
```
131134

132135
Note: The mpirun script assumes you are running as user 'rocky'. If you are running as root, you will need to include the --allow-run-as-root argument
136+
133137
## Reconfiguring the Cluster
134138

135139
### Changes to config files
@@ -173,3 +177,5 @@ and then restart the other dependent deployments to propagate changes:
173177
```console
174178
kubectl rollout restart deployment slurmd slurmctld login slurmdbd
175179
```
180+
181+
# Known Issues

image/docker-entrypoint.sh

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -91,12 +91,6 @@ then
9191
mkdir -p /home/rocky/.ssh
9292
cp /tmp/authorized_keys /home/rocky/.ssh/authorized_keys
9393

94-
if [ -f /home/rocky/.ssh/id_rsa.pub ]; then
95-
echo "ssh keys already found"
96-
else
97-
ssh-keygen -t rsa -f /home/rocky/.ssh/id_rsa -N ""
98-
fi
99-
10094
echo "---> Setting permissions for user home directories"
10195
pushd /home > /dev/null
10296
for DIR in *
@@ -119,14 +113,22 @@ then
119113
start_munge
120114

121115
echo "---> Setting up self ssh capabilities for OOD"
116+
117+
if [ -f /home/rocky/.ssh/id_rsa.pub ]; then
118+
echo "ssh keys already found"
119+
else
120+
ssh-keygen -t rsa -f /home/rocky/.ssh/id_rsa -N ""
121+
chown rocky:rocky /home/rocky/.ssh/id_rsa /home/rocky/.ssh/id_rsa.pub
122+
fi
123+
122124
ssh-keyscan localhost > /etc/ssh/ssh_known_hosts
123125
echo "" >> /home/rocky/.ssh/authorized_keys #Adding newline to avoid breaking authorized_keys file
124126
cat /home/rocky/.ssh/id_rsa.pub >> /home/rocky/.ssh/authorized_keys
125127

126128
echo "---> Starting Apache Server"
127129

128-
mkdir --parents /etc/ood/config/apps/shell
129-
env > /etc/ood/config/apps/shell/env
130+
# mkdir --parents /etc/ood/config/apps/shell
131+
# env > /etc/ood/config/apps/shell/env
130132

131133
/usr/libexec/httpd-ssl-gencerts
132134
/opt/ood/ood-portal-generator/sbin/update_ood_portal

nfs/deploy-nfs.sh

Lines changed: 0 additions & 11 deletions
This file was deleted.

nfs/pvc.yaml

Lines changed: 0 additions & 11 deletions
This file was deleted.

nfs/teardown-nfs.sh

Lines changed: 0 additions & 16 deletions
This file was deleted.

publish-keys.sh

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1-
kubectl create configmap authorized-keys-configmap \
1+
NAMESPACE="$1"
2+
if [[ -z $1 ]]; then
3+
NAMESPACE=default
4+
fi
5+
echo Installing in namespace $NAMESPACE
6+
kubectl -n $NAMESPACE create configmap authorized-keys-configmap \
27
"--from-literal=authorized_keys=$(cat ~/.ssh/*.pub)" --dry-run=client -o yaml | \
3-
kubectl apply -f -
8+
kubectl -n $NAMESPACE apply -f -

rooknfs/Chart.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
apiVersion: v2
2+
name: rooknfs
3+
version: 0.0.1
4+
description: A packaged installation of Rook NFS for Kubernetes.

rooknfs/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# RookNFS Helm Chart
2+
3+
See `values.yaml` for available config options.
File renamed without changes.

0 commit comments

Comments
 (0)