|
| 1 | +--- |
| 2 | +title: Cloud Bursting Setup Instruction |
| 3 | +description: Learn how to setup Cloud bursting using Azure CycleCloud and Slurm. |
| 4 | +author: vinil-v |
| 5 | +ms.date: 12/23/2024 |
| 6 | +ms.author: padmalathas |
| 7 | +--- |
| 8 | + |
| 9 | +## Setup Instructions |
| 10 | + |
| 11 | +After we have the prerequisites ready, we can follow these steps to integrate the external Slurm Scheduler node with the CycleCloud cluster: |
| 12 | + |
| 13 | +### Importing a Cluster Using the Slurm Headless Template in CycleCloud |
| 14 | + |
| 15 | +- This step must be executed on the **CycleCloud VM**. |
| 16 | +- Make sure that the **CycleCloud 8.6.4 VM** is running and accessible via the `cyclecloud` CLI. |
| 17 | +- Execute the `cyclecloud-project-build.sh` script and provide the desired cluster name (e.g., `hpc1`). This will set up a custom project based on the `cyclecloud-slurm-3.0.9` version and import the cluster using the Slurm headless template. |
| 18 | +- In the example provided, `hpc1` is used as the cluster name. You can choose any cluster name, but be consistent and use the same name throughout the entire setup. |
| 19 | + |
| 20 | + |
| 21 | +```bash |
| 22 | +git clone https://github.com/Azure/cyclecloud-slurm.git |
| 23 | +cd cyclecloud-slurm/cloud_bursting/slurm-23.11.9-1/cyclecloud |
| 24 | +sh cyclecloud-project-build.sh |
| 25 | +``` |
| 26 | + |
| 27 | +Output : |
| 28 | + |
| 29 | +```bash |
| 30 | +[user1@cc86vm ~]$ cd cyclecloud-slurm/cloud_bursting/slurm-23.11.9-1/cyclecloud |
| 31 | +[user1@cc86vm cyclecloud]$ sh cyclecloud-project-build.sh |
| 32 | +Enter Cluster Name: hpc1 |
| 33 | +Cluster Name: hpc1 |
| 34 | +Use the same cluster name: hpc1 in building the scheduler |
| 35 | +Importing Cluster |
| 36 | +Importing cluster Slurm_HL and creating cluster hpc1.... |
| 37 | +---------- |
| 38 | +hpc1 : off |
| 39 | +---------- |
| 40 | +Resource group: |
| 41 | +Cluster nodes: |
| 42 | +Total nodes: 0 |
| 43 | +Locker Name: cyclecloud_storage |
| 44 | +Fetching CycleCloud project |
| 45 | +Uploading CycleCloud project to the locker |
| 46 | +``` |
| 47 | + |
| 48 | +### Slurm Scheduler Installation and Configuration |
| 49 | + |
| 50 | +- A VM should be deployed using the specified **AlmaLinux HPC 8.7** or **Ubuntu HPC 22.04** image. |
| 51 | +- If you already have a Slurm Scheduler installed, you may skip this step. However, it is recommended to review the script to ensure compatibility with your existing setup. |
| 52 | +- Run the Slurm scheduler installation script (`slurm-scheduler-builder.sh`) and provide the cluster name (`hpc1`) when prompted. |
| 53 | +- This script will setup NFS server and install and configure Slurm Scheduler. |
| 54 | +- If you are using an external NFS server, you can remove the NFS setup entries from the script. |
| 55 | + |
| 56 | + |
| 57 | +```bash |
| 58 | +git clone https://github.com/Azure/cyclecloud-slurm.git |
| 59 | +cd cyclecloud-slurm/cloud_bursting/slurm-23.11.9-1/scheduler |
| 60 | +sh slurm-scheduler-builder.sh |
| 61 | +``` |
| 62 | +Output |
| 63 | + |
| 64 | +```bash |
| 65 | +------------------------------------------------------------------------------------------------------------------------------ |
| 66 | +Building Slurm scheduler for cloud bursting with Azure CycleCloud |
| 67 | +------------------------------------------------------------------------------------------------------------------------------ |
| 68 | + |
| 69 | +Enter Cluster Name: hpc1 |
| 70 | +------------------------------------------------------------------------------------------------------------------------------ |
| 71 | + |
| 72 | +Summary of entered details: |
| 73 | +Cluster Name: hpc1 |
| 74 | +Scheduler Hostname: masternode2 |
| 75 | +NFSServer IP Address: 10.222.xxx.xxx |
| 76 | +``` |
| 77 | + |
| 78 | +### CycleCloud UI Configuration |
| 79 | + |
| 80 | +- Access the **CycleCloud UI** and navigate to the settings for the `hpc1` cluster. |
| 81 | +- Edit the cluster settings to configure the VM SKUs and networking options as needed. |
| 82 | +- In the **Network Attached Storage** section, enter the NFS server IP address for the `/sched` and `/shared` mounts. |
| 83 | +- Select the OS from Advance setting tab - **Ubuntu 22.04** or **AlmaLinux 8** from the drop down based on the scheduler VM. |
| 84 | +- Once all settings are configured, click **Save** and then **Start** the `hpc1` cluster. |
| 85 | + |
| 86 | + |
| 87 | + |
| 88 | +### CycleCloud Autoscaler Integration on Slurm Scheduler |
| 89 | + |
| 90 | +- Integrate Slurm with CycleCloud using the `cyclecloud-integrator.sh` script. |
| 91 | +- Provide CycleCloud details (username, password, and ip address) when prompted. |
| 92 | + |
| 93 | +```bash |
| 94 | +cd cyclecloud-slurm/cloud_bursting/slurm-23.11.9-1/scheduler |
| 95 | +sh cyclecloud-integrator.sh |
| 96 | +``` |
| 97 | +Output: |
| 98 | + |
| 99 | +```bash |
| 100 | +[root@masternode2 scripts]# sh cyclecloud-integrator.sh |
| 101 | +Please enter the CycleCloud details to integrate with the Slurm scheduler |
| 102 | + |
| 103 | +Enter Cluster Name: hpc1 |
| 104 | +Enter CycleCloud Username: user1 |
| 105 | +Enter CycleCloud Password: |
| 106 | +Enter CycleCloud IP (e.g., 10.220.x.xx): 10.220.x.xx |
| 107 | +------------------------------------------------------------------------------------------------------------------------------ |
| 108 | + |
| 109 | +Summary of entered details: |
| 110 | +Cluster Name: hpc1 |
| 111 | +CycleCloud Username: user1 |
| 112 | +CycleCloud URL: https://10.220.x.xx |
| 113 | + |
| 114 | +------------------------------------------------------------------------------------------------------------------------------ |
| 115 | +``` |
| 116 | + |
| 117 | +### User and Group Setup (Optional) |
| 118 | + |
| 119 | +- Ensure consistent user and group IDs across all nodes. |
| 120 | +- It is advisable to use a centralized User Management system like LDAP to maintain consistent UID and GID across all nodes. |
| 121 | +- In this example, we are using the `useradd_example.sh` script to create a test user `user1` and a group for job submission. (User `user1` already exists in CycleCloud) |
| 122 | + |
| 123 | +```bash |
| 124 | +cd cyclecloud-slurm/cloud_bursting/slurm-23.11.9-1/scheduler |
| 125 | +sh useradd_example.sh |
| 126 | +``` |
| 127 | + |
| 128 | +### Testing the Setup |
| 129 | + |
| 130 | +- Log in as a test user (e.g., `user1`) on the Scheduler node. |
| 131 | +- Submit a test job to verify that the setup is functioning correctly. |
| 132 | + |
| 133 | +```bash |
| 134 | +su - user1 |
| 135 | +srun hostname & |
| 136 | +``` |
| 137 | +Output: |
| 138 | +```bash |
| 139 | +[root@masternode2 scripts]# su - user1 |
| 140 | +Last login: Tue May 14 04:54:51 UTC 2024 on pts/0 |
| 141 | +[user1@masternode2 ~]$ srun hostname & |
| 142 | +[1] 43448 |
| 143 | +[user1@masternode2 ~]$ squeue |
| 144 | + JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) |
| 145 | + 1 hpc hostname user1 CF 0:04 1 hpc1-hpc-1 |
| 146 | +[user1@masternode2 ~]$ hpc1-hpc-1 |
| 147 | +``` |
| 148 | + |
| 149 | + |
| 150 | +You should see the job running successfully, indicating a successful integration with CycleCloud. |
| 151 | + |
| 152 | +For further details and advanced configurations, refer to the scripts and documentation within this repository. |
0 commit comments