|
2 | 2 | title: Cloud Bursting Using Azure CycleCloud and Slurm
|
3 | 3 | description: Learn how to configure Cloud bursting using Azure CycleCloud and Slurm.
|
4 | 4 | author: vinil-v
|
5 |
| -ms.date: 04/17/2025 |
| 5 | +ms.date: 07/01/2025 |
6 | 6 | ms.author: padmalathas
|
7 | 7 | ---
|
8 | 8 |
|
9 |
| -# What is Cloud Bursting? |
| 9 | +# What is cloud bursting? |
10 | 10 |
|
11 |
| -Cloud bursting is a configuration in cloud computing that allows an organization to handle peaks in IT demand by using a combination of private and public clouds. When the resources in a private cloud reach their maximum capacity, the overflow traffic is directed to a public cloud to ensure there's no interruption in services. This setup provides flexibility and cost savings, as you only pay for the supplemental resources when there's a demand for them. |
| 11 | +Cloud bursting is a configuration in cloud computing that helps your organization handle peaks in IT demand by using a combination of private and public clouds. When the resources in a private cloud reach their maximum capacity, the configuration directs the overflow traffic to a public cloud. This setup ensures there's no interruption in services. Cloud bursting provides flexibility and cost savings, as you only pay for the supplemental resources when there's a demand for them. |
12 | 12 |
|
13 |
| -For example, an application can run on a private cloud and "burst" to a public cloud only when necessary to meet peak demands. This approach helps avoid the costs associated with maintaining extra capacity that isn't always in use. |
| 13 | +For example, an application can run on a private cloud and "burst" to a public cloud only when necessary to meet peak demands. This approach helps you avoid the costs associated with maintaining extra capacity that isn't always in use. |
14 | 14 |
|
15 |
| -Cloud bursting can be used in various scenarios, such as enabling on-premises workloads to be sent to the cloud for processing, known as hybrid HPC (High-Performance Computing). It allows users to optimize their resource utilization and cost efficiency while accessing the scalability and flexibility of the cloud. |
| 15 | +You can use cloud bursting in various scenarios, such as enabling on-premises workloads to be sent to the cloud for processing, known as hybrid HPC (High-Performance Computing). It allows you to optimize your resource utilization and cost efficiency while accessing the scalability and flexibility of the cloud. |
16 | 16 |
|
17 | 17 | ## Overview
|
18 | 18 |
|
19 | 19 | This document offers a step-by-step guide on installing and configuring a Slurm scheduler to burst computing resources into the cloud using Azure CycleCloud. It explains how to create a hybrid HPC environment by extending on-premises Slurm clusters into Azure, allowing for seamless access to scalable and flexible cloud computing resources. The guide provides a practical example of optimizing compute capacity by integrating local infrastructure with cloud-based solutions.
|
20 | 20 |
|
21 | 21 |
|
22 |
| -## Requirements to Setup Slurm Cloud Bursting Using CycleCloud on Azure |
| 22 | +## Requirements to Set Up Slurm Cloud Bursting Using CycleCloud on Azure |
23 | 23 |
|
24 | 24 | ## Azure subscription account
|
25 |
| -You must obtain an Azure subscription or be assigned as an Owner role of the subscription. |
| 25 | +You must have an Azure subscription or be assigned the Owner role for a subscription. |
26 | 26 |
|
27 |
| -* To create an Azure subscription, go to the [Create a Subscription](/azure/cost-management-billing/manage/create-subscription#create-a-subscription) documentation. |
| 27 | +* To create an Azure subscription, see [Create a Subscription](/azure/cost-management-billing/manage/create-subscription#create-a-subscription). |
28 | 28 | * To access an existing subscription, go to the [Azure portal](https://portal.azure.com/).
|
29 | 29 |
|
30 | 30 | ## Network infrastructure
|
31 |
| -If you intend to create a Slurm cluster entirely within Azure, you must deploy both the head nodes and the CycleCloud compute nodes within a single Azure Virtual Network (VNET). |
| 31 | +To create a Slurm cluster entirely within Azure, deploy both the head nodes and the CycleCloud compute nodes within a single Azure Virtual Network (VNET). |
32 | 32 |
|
33 | 33 | 
|
34 | 34 |
|
35 |
| -To create a hybrid HPC cluster with head nodes on your on-premises corporate network and compute nodes in Azure, set up a [Site-to-Site](/azure/vpn-gateway/tutorial-site-to-site-portal) VPN or an [ExpressRoute](/azure/expressroute/) connection. This links your network to the Azure VNET. The head nodes must be able to connect to Azure services online. You might need to work with your network administrator to set this up. |
36 |
| - |
37 |
| -## Network Ports and Security |
38 |
| -The following NSG rules must be configured for successful communication between Master node, CycleCloud server, and compute nodes. |
| 35 | +To create a hybrid HPC cluster with head nodes on your on-premises corporate network and compute nodes in Azure, set up a [Site-to-Site](/azure/vpn-gateway/tutorial-site-to-site-portal) VPN or an [ExpressRoute](/azure/expressroute/) connection. This setup links your network to the Azure VNET. The head nodes must be able to connect to Azure services online. You might need to work with your network administrator to set up this connection. |
39 | 36 |
|
| 37 | +## Network ports and security |
| 38 | +To enable communication between the primary node, CycleCloud server, and compute nodes, configure the following NSG rules. |
40 | 39 |
|
41 | 40 | | **Service** | **Port** | **Protocol** | **Direction** | **Purpose** | **Requirement** |
|
42 | 41 | |------------------------------------|-----------------|--------------|------------------|------------------------------------------------------------------------|---------------------------------------------------------------------------------|
|
43 |
| -| **SSH (Secure Shell)** | 22 | TCP | Inbound/Outbound | Secure command-line access to the Slurm Master node | Open on both on-premises firewall and Azure NSGs | |
44 |
| -| **Slurm Control (slurmctld, slurmd)** | 6817, 6818 | TCP | Inbound/Outbound | Communication between Slurm Master and compute nodes | Open in on-premises firewall and Azure NSGs | |
45 |
| -| **Munge Authentication Service** | 4065 | TCP | Inbound/Outbound | Authentication between Slurm Master and compute nodes | Open on both on-premises network and Azure NSGs | |
46 |
| -| **CycleCloud Service** | 443 | TCP | Outbound | Communication between Slurm Master node and Azure CycleCloud | Allow outbound connections to Azure CycleCloud services from the Slurm Master node | |
47 |
| -| **NFS ports** | 2049 | TCP | Inbound/Outbound | Shared filesystem access between Master node and Azure CycleCloud | Open on both on-premises network and Azure NSGs | |
48 |
| -| **LDAP port** (Optional) | 389 | TCP | Inbound/Outbound | Centralized authentication mechanism for user management | Open on both on-premises network and Azure NSGs |
| 42 | +| **SSH (Secure Shell)** | 22 | TCP | Inbound/Outbound | Secure command-line access to the Slurm primary node | Open on both on-premises firewall and Azure NSGs | |
| 43 | +| **Slurm Control (slurmctld, slurmd)** | 6817, 6818 | TCP | Inbound/Outbound | Communication between Slurm primary and compute nodes | Open in on-premises firewall and Azure NSGs | |
| 44 | +| **Munge Authentication Service** | 4065 | TCP | Inbound/Outbound | Authentication between Slurm primary and compute nodes | Open on both on-premises network and Azure NSGs | |
| 45 | +| **CycleCloud Service** | 443 | TCP | Outbound | Communication between Slurm primary node and Azure CycleCloud | Allow outbound connections to Azure CycleCloud services from the Slurm primary node | |
| 46 | +| **NFS ports** | 2049 | TCP | Inbound/Outbound | Shared filesystem access between primary node and Azure CycleCloud | Open on both on-premises network and Azure NSGs | |
| 47 | +| **LDAP port** (Optional) | 389 | TCP | Inbound/Outbound | Centralized authentication mechanism for user management | Open on both on-premises network and Azure NSGs |
49 | 48 |
|
50 |
| -Refer [Slurm Network Configuration Guide](https://slurm.schedmd.com/network.html) |
| 49 | +See [Slurm Network Configuration Guide](https://slurm.schedmd.com/network.html). |
51 | 50 |
|
52 |
| -## Software Requirement |
| 51 | +## Software requirements |
53 | 52 |
|
54 |
| -- **OS Version**: AlmaLinux release 8.x or Ubuntu 22.04 |
55 |
| -- **CycleCloud Version**: 8.x or later |
56 |
| -- **CycleCloud-Slurm Project Version**: 3.0.x |
| 53 | +- **OS version**: AlmaLinux release 8.x or Ubuntu 22.04 |
| 54 | +- **CycleCloud version**: 8.x or later |
| 55 | +- **CycleCloud-Slurm project version**: 3.0.x |
57 | 56 |
|
58 |
| -## NFS File server |
59 |
| -A shared file system between the external Slurm Scheduler node and the CycleCloud cluster. You can use Azure NetApp Files, Azure Files, NFS, or other methods to mount the same file system on both sides. In this example, we're using a Scheduler VM as an NFS server. |
| 57 | +## NFS file server |
| 58 | +A shared file system between the external Slurm scheduler node and the CycleCloud cluster. You can use Azure NetApp Files, Azure Files, NFS, or other methods to mount the same file system on both sides. In this example, use a scheduler VM as an NFS server. |
60 | 59 |
|
61 |
| -## Centralized User management system (LDAP or AD) |
| 60 | +## Centralized user management system (LDAP or AD) |
62 | 61 | In HPC environments, maintaining consistent user IDs (UIDs) and group IDs (GIDs) across the cluster is critical for seamless user access and resource management. A centralized user management system, such as LDAP or Active Directory (AD), ensures that UIDs and GIDs are synchronized across all compute nodes and storage systems.
|
63 | 62 |
|
64 | 63 | > [!Important]
|
65 | 64 | >
|
66 |
| -> For more information on how to setup and instructions, see the blog post about [Slurm Cloud Bursting Using CycleCloud on Azure](https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/setting-up-slurm-cloud-bursting-using-cyclecloud-on-azure/4140922). |
| 65 | +> For more information on how to set up a centralized user management system, see the blog post about [Slurm Cloud Bursting Using CycleCloud on Azure](https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/setting-up-slurm-cloud-bursting-using-cyclecloud-on-azure/4140922). |
67 | 66 |
|
68 |
| -### Next Steps |
| 67 | +### Next steps |
69 | 68 |
|
70 | 69 | * [GitHub repo - cyclecloud-slurm](https://github.com/Azure/cyclecloud-slurm/tree/master)
|
71 | 70 | * [Azure CycleCloud Documentation](../../overview.md)
|
|
0 commit comments