Skip to content

Commit 927b30f

Browse files
authored
Merge pull request #298417 from Padmalathas/FixingCloudBursting
Fixing the Cloud Bursting articles
2 parents 97bd2ef + ff19d83 commit 927b30f

File tree

7 files changed

+232
-215
lines changed

7 files changed

+232
-215
lines changed

.openpublishing.redirection.json

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6802,7 +6802,12 @@
68026802
{
68036803
"source_path": "articles/defender-for-iot/organizations/extra-deploy-enterprise-iot.md",
68046804
"redirect_url": "/azure/defender-for-iot/organizations/eiot-defender-for-endpoint",
6805-
"redirect_document_id": false
6805+
"redirect_document_id": false
6806+
},
6807+
{
6808+
"source_path": "articles/cyclecloud/how-to/slurm-cloud-bursting-setup.md",
6809+
"redirect_url": "/azure/cyclecloud/how-to/bursting/slurm-cloud-bursting-setup",
6810+
"redirect_document_id": false
68066811
},
68076812
{
68086813
"source_path": "articles/sentinel/work-with-styx-objects-indicators.md",
Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
---
2+
title: Cloud Bursting Setup Instruction
3+
description: Learn how to set up Cloud bursting using Azure CycleCloud and Slurm.
4+
author: vinil-v
5+
ms.date: 04/17/2025
6+
ms.author: padmalathas
7+
---
8+
9+
# Setup Instructions
10+
11+
After we have the prerequisites ready, we can follow these steps to integrate the external Slurm Scheduler node with the CycleCloud cluster:
12+
13+
## Importing a Cluster Using the Slurm Headless Template in CycleCloud
14+
15+
- This step must be executed on the **CycleCloud VM**.
16+
- Make sure that the **CycleCloud 8.6.4 VM** is running and accessible via the `cyclecloud` CLI.
17+
- Execute the `cyclecloud-project-build.sh` script and provide the desired cluster name (for example, `hpc1`). This sets a custom project based on the `cyclecloud-slurm-3.0.9` version and import the cluster using the Slurm headless template.
18+
- In the example provided, `<clustername>` is used as the cluster name. Choose any cluster name you like, but same name must be consistently used throughout the entire setup.
19+
20+
21+
```bash
22+
git clone https://github.com/Azure/cyclecloud-slurm.git
23+
cd cyclecloud-slurm/cloud_bursting/slurm-23.11.9-1/cyclecloud
24+
sh cyclecloud-project-build.sh
25+
```
26+
27+
Output:
28+
29+
```bash
30+
[user1@cc86vm ~]$ cd cyclecloud-slurm/cloud_bursting/slurm-23.11.9-1/cyclecloud
31+
[user1@cc86vm cyclecloud]$ sh cyclecloud-project-build.sh
32+
Enter Cluster Name: <clustername>
33+
Cluster Name: <clustername>
34+
Use the same cluster name: <clustername> in building the scheduler
35+
Importing Cluster
36+
Importing cluster Slurm_HL and creating cluster hpc1....
37+
----------
38+
<clustername> : off
39+
----------
40+
Resource group:
41+
Cluster nodes:
42+
Total nodes: 0
43+
Locker Name: cyclecloud_storage
44+
Fetching CycleCloud project
45+
Uploading CycleCloud project to the locker
46+
```
47+
48+
## Slurm Scheduler Installation and Configuration
49+
50+
- A VM should be deployed using the specified **AlmaLinux HPC 8.7** or **Ubuntu HPC 22.04** image.
51+
- If you already have a Slurm Scheduler installed, you can skip this step. However, it's advisable to review the script to make sure it's compatible with your current setup.
52+
- Run the Slurm scheduler installation script (`slurm-scheduler-builder.sh`) and provide the cluster name (`<clustername>`) when prompted.
53+
- This script sets up the NFS server and installs and configures the Slurm Scheduler.
54+
- If you're using an external NFS server, you can delete the NFS setup entries from the script.
55+
56+
```bash
57+
git clone https://github.com/Azure/cyclecloud-slurm.git
58+
cd cyclecloud-slurm/cloud_bursting/slurm-23.11.9-1/scheduler
59+
sh slurm-scheduler-builder.sh
60+
```
61+
Output:
62+
63+
```bash
64+
------------------------------------------------------------------------------------------------------------------------------
65+
Building Slurm scheduler for cloud bursting with Azure CycleCloud
66+
------------------------------------------------------------------------------------------------------------------------------
67+
68+
Enter Cluster Name: <clustername>
69+
------------------------------------------------------------------------------------------------------------------------------
70+
71+
Summary of entered details:
72+
Cluster Name: <clustername>
73+
Scheduler Hostname: <scheduler hostname>
74+
NFSServer IP Address: 10.222.xxx.xxx
75+
```
76+
77+
## CycleCloud UI Configuration
78+
79+
- Access the **CycleCloud UI** and navigate to the settings for the `<clustername>` cluster.
80+
- Edit the cluster settings to configure the VM SKUs and networking options as needed.
81+
- In the **Network Attached Storage** section, enter the NFS server IP address for the `/sched` and `/shared` mounts.
82+
- On the Advance setting tab, from the dropdown menu choose the OS: either **Ubuntu 22.04** or **AlmaLinux 8** based on the scheduler VM.
83+
- Once all settings are configured, click **Save** and then **Start** the `<clustername>` cluster.
84+
85+
![NFS settings](../../images/slurm-cloud-burst/cyclecloud-ui-config.png)
86+
87+
## CycleCloud Autoscaler Integration on Slurm Scheduler
88+
89+
- Integrate Slurm with CycleCloud using the `cyclecloud-integrator.sh` script.
90+
- Provide CycleCloud details (username, password, and ip address) when prompted.
91+
92+
```bash
93+
cd cyclecloud-slurm/cloud_bursting/slurm-23.11.9-1/scheduler
94+
sh cyclecloud-integrator.sh
95+
```
96+
Output:
97+
98+
```bash
99+
[root@masternode2 scripts]# sh cyclecloud-integrator.sh
100+
Please enter the CycleCloud details to integrate with the Slurm scheduler
101+
102+
Enter Cluster Name: <clustername>
103+
Enter CycleCloud Username: <username>
104+
Enter CycleCloud Password: <password>
105+
Enter CycleCloud IP (e.g., 10.220.x.xx): <ip address>
106+
------------------------------------------------------------------------------------------------------------------------------
107+
108+
Summary of entered details:
109+
Cluster Name: <clustername>
110+
CycleCloud Username: <username>
111+
CycleCloud URL: https://<ip address>
112+
113+
------------------------------------------------------------------------------------------------------------------------------
114+
```
115+
116+
## User and Group Setup (Optional)
117+
118+
- Ensure consistent user and group IDs across all nodes.
119+
- It's advisable to use a centralized User Management system like LDAP to maintain consistent UID and GID across all nodes.
120+
- In this example, we're using the `useradd_example.sh` script to create a test user `<username>` and a group for job submission. (User `<username>` already exists in CycleCloud)
121+
122+
```bash
123+
cd cyclecloud-slurm/cloud_bursting/slurm-23.11.9-1/scheduler
124+
sh useradd_example.sh
125+
```
126+
127+
## Testing the Setup
128+
129+
- Log in as a test user (example, `<username>`) on the Scheduler node.
130+
- Submit a test job to verify that the setup is functioning correctly.
131+
132+
```bash
133+
su - <username>
134+
srun hostname &
135+
```
136+
Output:
137+
```bash
138+
[root@masternode2 scripts]# su - <username>
139+
Last login: Tue May 14 04:54:51 UTC 2024 on pts/0
140+
[<username>@masternode2 ~]$ srun hostname &
141+
[1] 43448
142+
[<username>@masternode2 ~]$ squeue
143+
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
144+
1 hpc hostname <username> CF 0:04 1 <clustername>-hpc-1
145+
[user1@masternode2 ~]$ <clustername>-hpc-1
146+
```
147+
![Node Creation](../../images/slurm-cloud-burst/cyclecloud-ui-new-node.png)
148+
149+
You should see the job running successfully, indicating a successful integration with CycleCloud.
150+
151+
For more information and advanced configurations, see the scripts and documentation within this repository.
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
---
2+
title: Cloud Bursting Using Azure CycleCloud and Slurm
3+
description: Learn how to configure Cloud bursting using Azure CycleCloud and Slurm.
4+
author: vinil-v
5+
ms.date: 04/17/2025
6+
ms.author: padmalathas
7+
---
8+
9+
# What is Cloud Bursting?
10+
11+
Cloud bursting is a configuration in cloud computing that allows an organization to handle peaks in IT demand by using a combination of private and public clouds. When the resources in a private cloud reach their maximum capacity, the overflow traffic is directed to a public cloud to ensure there's no interruption in services. This setup provides flexibility and cost savings, as you only pay for the supplemental resources when there's a demand for them.
12+
13+
For example, an application can run on a private cloud and "burst" to a public cloud only when necessary to meet peak demands. This approach helps avoid the costs associated with maintaining extra capacity that isn't always in use.
14+
15+
Cloud bursting can be used in various scenarios, such as enabling on-premises workloads to be sent to the cloud for processing, known as hybrid HPC (High-Performance Computing). It allows users to optimize their resource utilization and cost efficiency while accessing the scalability and flexibility of the cloud.
16+
17+
## Overview
18+
19+
This document offers a step-by-step guide on installing and configuring a Slurm scheduler to burst computing resources into the cloud using Azure CycleCloud. It explains how to create a hybrid HPC environment by extending on-premises Slurm clusters into Azure, allowing for seamless access to scalable and flexible cloud computing resources. The guide provides a practical example of optimizing compute capacity by integrating local infrastructure with cloud-based solutions.
20+
21+
22+
## Requirements to Setup Slurm Cloud Bursting Using CycleCloud on Azure
23+
24+
## Azure subscription account
25+
You must obtain an Azure subscription or be assigned as an Owner role of the subscription.
26+
27+
* To create an Azure subscription, go to the [Create a Subscription](/azure/cost-management-billing/manage/create-subscription#create-a-subscription) documentation.
28+
* To access an existing subscription, go to the [Azure portal](https://portal.azure.com/).
29+
30+
## Network infrastructure
31+
If you intend to create a Slurm cluster entirely within Azure, you must deploy both the head nodes and the CycleCloud compute nodes within a single Azure Virtual Network (VNET).
32+
33+
![Slurm cluster](../../images/slurm-cloud-burst/slurm-cloud-burst-architecture.png)
34+
35+
To create a hybrid HPC cluster with head nodes on your on-premises corporate network and compute nodes in Azure, set up a [Site-to-Site](/azure/vpn-gateway/tutorial-site-to-site-portal) VPN or an [ExpressRoute](/azure/expressroute/) connection. This links your network to the Azure VNET. The head nodes must be able to connect to Azure services online. You might need to work with your network administrator to set this up.
36+
37+
## Network Ports and Security
38+
The following NSG rules must be configured for successful communication between Master node, CycleCloud server, and compute nodes.
39+
40+
41+
| **Service** | **Port** | **Protocol** | **Direction** | **Purpose** | **Requirement** |
42+
|------------------------------------|-----------------|--------------|------------------|------------------------------------------------------------------------|---------------------------------------------------------------------------------|
43+
| **SSH (Secure Shell)** | 22 | TCP | Inbound/Outbound | Secure command-line access to the Slurm Master node | Open on both on-premises firewall and Azure NSGs |
44+
| **Slurm Control (slurmctld, slurmd)** | 6817, 6818 | TCP | Inbound/Outbound | Communication between Slurm Master and compute nodes | Open in on-premises firewall and Azure NSGs |
45+
| **Munge Authentication Service** | 4065 | TCP | Inbound/Outbound | Authentication between Slurm Master and compute nodes | Open on both on-premises network and Azure NSGs |
46+
| **CycleCloud Service** | 443 | TCP | Outbound | Communication between Slurm Master node and Azure CycleCloud | Allow outbound connections to Azure CycleCloud services from the Slurm Master node |
47+
| **NFS ports** | 2049 | TCP | Inbound/Outbound | Shared filesystem access between Master node and Azure CycleCloud | Open on both on-premises network and Azure NSGs |
48+
| **LDAP port** (Optional) | 389 | TCP | Inbound/Outbound | Centralized authentication mechanism for user management | Open on both on-premises network and Azure NSGs
49+
50+
Refer [Slurm Network Configuration Guide](https://slurm.schedmd.com/network.html)
51+
52+
## Software Requirement
53+
54+
- **OS Version**: AlmaLinux release 8.x or Ubuntu 22.04
55+
- **CycleCloud Version**: 8.x or later
56+
- **CycleCloud-Slurm Project Version**: 3.0.x
57+
58+
## NFS File server
59+
A shared file system between the external Slurm Scheduler node and the CycleCloud cluster. You can use Azure NetApp Files, Azure Files, NFS, or other methods to mount the same file system on both sides. In this example, we're using a Scheduler VM as an NFS server.
60+
61+
## Centralized User management system (LDAP or AD)
62+
In HPC environments, maintaining consistent user IDs (UIDs) and group IDs (GIDs) across the cluster is critical for seamless user access and resource management. A centralized user management system, such as LDAP or Active Directory (AD), ensures that UIDs and GIDs are synchronized across all compute nodes and storage systems.
63+
64+
> [!Important]
65+
>
66+
> For more information on how to setup and instructions, see the blog post about [Slurm Cloud Bursting Using CycleCloud on Azure](https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/setting-up-slurm-cloud-bursting-using-cyclecloud-on-azure/4140922).
67+
68+
### Next Steps
69+
70+
* [GitHub repo - cyclecloud-slurm](https://github.com/Azure/cyclecloud-slurm/tree/master)
71+
* [Azure CycleCloud Documentation](../../overview.md)
72+
* [Slurm documentation](https://slurm.schedmd.com/documentation.html)

0 commit comments

Comments
 (0)