|
| 1 | +--- |
| 2 | +title: Cloud Bursting Setup Instruction |
| 3 | +description: Learn how to set up Cloud bursting using Azure CycleCloud and Slurm. |
| 4 | +author: vinil-v |
| 5 | +ms.date: 04/17/2025 |
| 6 | +ms.author: padmalathas |
| 7 | +--- |
| 8 | + |
| 9 | +# Setup Instructions |
| 10 | + |
| 11 | +After we have the prerequisites ready, we can follow these steps to integrate the external Slurm Scheduler node with the CycleCloud cluster: |
| 12 | + |
| 13 | +## Importing a Cluster Using the Slurm Headless Template in CycleCloud |
| 14 | + |
| 15 | +- This step must be executed on the **CycleCloud VM**. |
| 16 | +- Make sure that the **CycleCloud 8.6.4 VM** is running and accessible via the `cyclecloud` CLI. |
| 17 | +- Execute the `cyclecloud-project-build.sh` script and provide the desired cluster name (for example, `hpc1`). This sets a custom project based on the `cyclecloud-slurm-3.0.9` version and import the cluster using the Slurm headless template. |
| 18 | +- In the example provided, `<clustername>` is used as the cluster name. Choose any cluster name you like, but same name must be consistently used throughout the entire setup. |
| 19 | + |
| 20 | + |
| 21 | +```bash |
| 22 | +git clone https://github.com/Azure/cyclecloud-slurm.git |
| 23 | +cd cyclecloud-slurm/cloud_bursting/slurm-23.11.9-1/cyclecloud |
| 24 | +sh cyclecloud-project-build.sh |
| 25 | +``` |
| 26 | + |
| 27 | +Output: |
| 28 | + |
| 29 | +```bash |
| 30 | +[user1@cc86vm ~]$ cd cyclecloud-slurm/cloud_bursting/slurm-23.11.9-1/cyclecloud |
| 31 | +[user1@cc86vm cyclecloud]$ sh cyclecloud-project-build.sh |
| 32 | +Enter Cluster Name: <clustername> |
| 33 | +Cluster Name: <clustername> |
| 34 | +Use the same cluster name: <clustername> in building the scheduler |
| 35 | +Importing Cluster |
| 36 | +Importing cluster Slurm_HL and creating cluster hpc1.... |
| 37 | +---------- |
| 38 | +<clustername> : off |
| 39 | +---------- |
| 40 | +Resource group: |
| 41 | +Cluster nodes: |
| 42 | +Total nodes: 0 |
| 43 | +Locker Name: cyclecloud_storage |
| 44 | +Fetching CycleCloud project |
| 45 | +Uploading CycleCloud project to the locker |
| 46 | +``` |
| 47 | + |
| 48 | +## Slurm Scheduler Installation and Configuration |
| 49 | + |
| 50 | +- A VM should be deployed using the specified **AlmaLinux HPC 8.7** or **Ubuntu HPC 22.04** image. |
| 51 | +- If you already have a Slurm Scheduler installed, you can skip this step. However, it's advisable to review the script to make sure it's compatible with your current setup. |
| 52 | +- Run the Slurm scheduler installation script (`slurm-scheduler-builder.sh`) and provide the cluster name (`<clustername>`) when prompted. |
| 53 | +- This script sets up the NFS server and installs and configures the Slurm Scheduler. |
| 54 | +- If you're using an external NFS server, you can delete the NFS setup entries from the script. |
| 55 | + |
| 56 | +```bash |
| 57 | +git clone https://github.com/Azure/cyclecloud-slurm.git |
| 58 | +cd cyclecloud-slurm/cloud_bursting/slurm-23.11.9-1/scheduler |
| 59 | +sh slurm-scheduler-builder.sh |
| 60 | +``` |
| 61 | +Output: |
| 62 | + |
| 63 | +```bash |
| 64 | +------------------------------------------------------------------------------------------------------------------------------ |
| 65 | +Building Slurm scheduler for cloud bursting with Azure CycleCloud |
| 66 | +------------------------------------------------------------------------------------------------------------------------------ |
| 67 | + |
| 68 | +Enter Cluster Name: <clustername> |
| 69 | +------------------------------------------------------------------------------------------------------------------------------ |
| 70 | + |
| 71 | +Summary of entered details: |
| 72 | +Cluster Name: <clustername> |
| 73 | +Scheduler Hostname: <scheduler hostname> |
| 74 | +NFSServer IP Address: 10.222.xxx.xxx |
| 75 | +``` |
| 76 | + |
| 77 | +## CycleCloud UI Configuration |
| 78 | + |
| 79 | +- Access the **CycleCloud UI** and navigate to the settings for the `<clustername>` cluster. |
| 80 | +- Edit the cluster settings to configure the VM SKUs and networking options as needed. |
| 81 | +- In the **Network Attached Storage** section, enter the NFS server IP address for the `/sched` and `/shared` mounts. |
| 82 | +- On the Advance setting tab, from the dropdown menu choose the OS: either **Ubuntu 22.04** or **AlmaLinux 8** based on the scheduler VM. |
| 83 | +- Once all settings are configured, click **Save** and then **Start** the `<clustername>` cluster. |
| 84 | + |
| 85 | + |
| 86 | + |
| 87 | +## CycleCloud Autoscaler Integration on Slurm Scheduler |
| 88 | + |
| 89 | +- Integrate Slurm with CycleCloud using the `cyclecloud-integrator.sh` script. |
| 90 | +- Provide CycleCloud details (username, password, and ip address) when prompted. |
| 91 | + |
| 92 | +```bash |
| 93 | +cd cyclecloud-slurm/cloud_bursting/slurm-23.11.9-1/scheduler |
| 94 | +sh cyclecloud-integrator.sh |
| 95 | +``` |
| 96 | +Output: |
| 97 | + |
| 98 | +```bash |
| 99 | +[root@masternode2 scripts]# sh cyclecloud-integrator.sh |
| 100 | +Please enter the CycleCloud details to integrate with the Slurm scheduler |
| 101 | + |
| 102 | +Enter Cluster Name: <clustername> |
| 103 | +Enter CycleCloud Username: <username> |
| 104 | +Enter CycleCloud Password: <password> |
| 105 | +Enter CycleCloud IP (e.g., 10.220.x.xx): <ip address> |
| 106 | +------------------------------------------------------------------------------------------------------------------------------ |
| 107 | + |
| 108 | +Summary of entered details: |
| 109 | +Cluster Name: <clustername> |
| 110 | +CycleCloud Username: <username> |
| 111 | +CycleCloud URL: https://<ip address> |
| 112 | + |
| 113 | +------------------------------------------------------------------------------------------------------------------------------ |
| 114 | +``` |
| 115 | + |
| 116 | +## User and Group Setup (Optional) |
| 117 | + |
| 118 | +- Ensure consistent user and group IDs across all nodes. |
| 119 | +- It's advisable to use a centralized User Management system like LDAP to maintain consistent UID and GID across all nodes. |
| 120 | +- In this example, we're using the `useradd_example.sh` script to create a test user `<username>` and a group for job submission. (User `<username>` already exists in CycleCloud) |
| 121 | + |
| 122 | +```bash |
| 123 | +cd cyclecloud-slurm/cloud_bursting/slurm-23.11.9-1/scheduler |
| 124 | +sh useradd_example.sh |
| 125 | +``` |
| 126 | + |
| 127 | +## Testing the Setup |
| 128 | + |
| 129 | +- Log in as a test user (example, `<username>`) on the Scheduler node. |
| 130 | +- Submit a test job to verify that the setup is functioning correctly. |
| 131 | + |
| 132 | +```bash |
| 133 | +su - <username> |
| 134 | +srun hostname & |
| 135 | +``` |
| 136 | +Output: |
| 137 | +```bash |
| 138 | +[root@masternode2 scripts]# su - <username> |
| 139 | +Last login: Tue May 14 04:54:51 UTC 2024 on pts/0 |
| 140 | +[<username>@masternode2 ~]$ srun hostname & |
| 141 | +[1] 43448 |
| 142 | +[<username>@masternode2 ~]$ squeue |
| 143 | + JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) |
| 144 | + 1 hpc hostname <username> CF 0:04 1 <clustername>-hpc-1 |
| 145 | +[user1@masternode2 ~]$ <clustername>-hpc-1 |
| 146 | +``` |
| 147 | + |
| 148 | + |
| 149 | +You should see the job running successfully, indicating a successful integration with CycleCloud. |
| 150 | + |
| 151 | +For more information and advanced configurations, see the scripts and documentation within this repository. |
0 commit comments