You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/high-performance-computing/lift-and-shift-overview.md
+13-13Lines changed: 13 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,11 +5,11 @@ author: tomvcassidy
5
5
ms.author: tomcassidy
6
6
ms.date: 08/30/2024
7
7
ms.topic: how-to
8
-
ms.service:
9
-
services:
8
+
ms.service: azure-virtual-machines
9
+
ms.subservice: hpc
10
10
---
11
11
12
-
# End-to-end HPC lift and shift architecture
12
+
# End-to-end HPC lift and shift architecture overview
13
13
14
14
"Lift and shift" in the context of High-Performance Computing (HPC) mostly refers to the process of migrating an on-premises environment and workload to the cloud. Ideally, modifications are kept to a minimum (for example, applications, job schedulers, and their configurations should remain mostly the same). Adjustments on storage and hardware are natural to happen because resources are different from on-premises to cloud platforms. With the lift and shift approach, organizations can start benefiting from the cloud more quickly.
15
15
@@ -28,44 +28,44 @@ This document therefore:
28
28
Before jumping into the architecture description, it's relevant to understand
29
29
the different personas in this context, their needs, and expectations.
30
30
31
-
## Personas and User Experience
31
+
## Personas and user experience
32
32
33
33
There are different people who need to access the HPC environment. Their activities and how they interact with the environment vary quite a bit.
34
34
35
35
### End-user (engineer / scientist / researcher)
36
36
37
37
This persona represents the subject matter expert (for example, biologist, physicist, engineer, etc.) who wants to run experiments (that is, submit jobs) and analyze results. End-users interact with system administrators to fine-tune the computing environment whenever needed. They may have some experience using CLI-based tools, but some of them may rely only on web portals or graphical user interfaces via VDI to submit their jobs and interact with the generated results.
38
38
39
-
**New Responsibilities in Cloud HPC Environment:**
39
+
**New responsibilities in cloud HPC environment:**
40
40
41
41
- End-user shouldn't have any new responsibilities based on the work from both the HPC Administrator and Cloud Administrator. Depending on the on-premises environment, end-users have access to a larger capacity and variety of computing resources to become more productive.
42
42
43
-
### HPC Administrator
43
+
### HPC administrator
44
44
45
45
This persona represents the one who has HPC expertise and is responsible for deploying the initial computing infrastructure and adapting it according to business and end-user needs. This persona is also responsible for verifying the health of the system and performing troubleshooting. HPC administrators are comfortable accessing the architecture and its components via CLI, SDKs, and web portals. They're also the first point of contact when end-users face any challenge with the computing environment.
46
46
47
-
**New Responsibilities in Cloud HPC Environment:**
47
+
**New responsibilities in cloud HPC environment:**
48
48
49
49
- Managing cloud resources and services (for example, virtual machines, storage, networking) via cloud management platforms.
50
50
- Implementing and managing clusters and resources via new resource orchestration tools (for example, CycleCloud).
51
51
- Optimizing application deployment by understanding infrastructure details (that is, VM types, storage, and network options).
52
52
- Optimizing resource utilization and costs by using cloud-specific features such as autoscaling and spot instances.
53
53
54
-
### Cloud Administrator
54
+
### Cloud administrator
55
55
56
56
This persona works with the HPC administrator to help deploy and maintain the computing infrastructure. This persona isn't (necessarily) an HPC expert, but a Cloud expert with deep knowledge of the overall company IT infrastructure, including network configurations/policies, user access rights, and user devices. Depending on the case, the HPC administrator and Cloud administrator may be the same person.
57
57
58
-
**New Responsibilities in Cloud HPC Environment:**
58
+
**New responsibilities in cloud HPC environment:**
59
59
60
60
- Collaborating with HPC administrators to ensure seamless integration of HPC workloads with cloud infrastructure.
61
61
- Monitoring and managing cloud infrastructure performance, security, and compliance.
62
62
- Helping with the configuration of cloud-based networking and storage solutions to support HPC workloads.
63
63
64
-
### Business Manager / Owner
64
+
### Business manager / owner
65
65
66
66
This persona represents the one who is responsible for the business, which includes taking care of budget and projects to meet organizational goals. For this persona, the accounting component of the architecture is relevant to understand costs for each project. This persona works with HPC admins and end-users to understand platform needs, including storage, network, computing resources. They also plan for future workloads.
67
67
68
-
**New Responsibilities in Cloud HPC Environment:**
68
+
**New responsibilities in cloud HPC environment:**
69
69
70
70
- Analyzing detailed cost reports and usage metrics provided by cloud service providers to manage budgets and forecast expenses.
71
71
- Making strategic decisions based on cloud resource usage and cost optimization opportunities.
@@ -81,8 +81,8 @@ There are also extensions that could be in place, such as sign-in nodes, data mo
81
81
82
82
This production-level environment may have various components to be set up. Therefore, environment deployers and managers become key to automate its initial deployment and upgrade it along the way, respectively. More advanced installations can also have environment templates (or specifications) with software versions and configurations that are more optimal and tested properly. Once the environment is in production with all the required components in place, over time, adjustments may be required to meet user demands, including changes in VM types or storage options/capabilities.
83
83
84
-
## Instantiating the lift and shift HPC Cloud architecture
84
+
## Instantiating the lift and shift HPC cloud architecture
85
85
86
86
Here we provide more details for each architecture component, including pointers to official Azure products, tech blogs with some best practices, git repositories, and links to non-product solutions.
87
87
88
-
**Quick start.** For a quick start solution to create an HPC environment in the cloud with basic building blocks, we recommend using [Azure CycleCloud Slurm workspace](https://techcommunity.microsoft.com/t5/azure-high-performance-computing/introducing-azure-cyclecloud-slurm-workspace-preview/ba-p/4158433).
88
+
**Quick start.** For a quick start solution to create an HPC environment in the cloud with basic building blocks, we recommend using [Azure CycleCloud Slurm workspace](https://techcommunity.microsoft.com/t5/azure-high-performance-computing/introducing-azure-cyclecloud-slurm-workspace-preview/ba-p/4158433).
@@ -20,7 +20,7 @@ In this article and the following articles, we guide you through a product-level
20
20
21
21
You need an Azure subscription to provision cloud resources.
22
22
23
-
## Migrating from on-premises to the cloud: Production level
23
+
## Migrating from on-premises to the cloud: production level
24
24
25
25
After the proof-of-concept phase, planning is required to get ready for creating a production-level HPC environment. This new environment can represent part of the on-premises infrastructure (for example, an HPC cluster from a group of clusters or queue/partition from an existing cluster), or the entire computing capability.
26
26
@@ -34,7 +34,6 @@ Due to component dependencies, the deployment of this HPC cloud environment is b
34
34
1. Compute nodes' specifications;
35
35
1. End user entry point.
36
36
37
-
In the following articles, we cover each deployment step and the components involved. In the descriptions of the components, we highlight their relevant dependencies in more detail. It is also worth noting that the component deployment steps can be executed in several ways. We provide a few tips to help get started with the deployment components via the Azure portal. But at a production level, we recommend the creation of an environment deployer that leverages Infrastructure-as-code (e.g. via bicep, Terraform, or Azure CLI). By doing so, one can create an environment in an automated and replicable fashion.
37
+
In the following articles, we cover each deployment step and the components involved. In the descriptions of the components, we highlight their relevant dependencies in more detail. It's also worth noting that the component deployment steps can be executed in several ways. We provide a few tips to help get started with the deployment components via the Azure portal. But at a production level, we recommend the creation of an environment deployer that leverages infrastructure-as-code (for example, via bicep, Terraform, or Azure CLI). By doing so, one can create an environment in an automated and replicable fashion.
38
38
39
39
For each step, certain topics need to be assessed before starting the migration process.
Copy file name to clipboardExpand all lines: articles/high-performance-computing/lift-and-shift-proof-of-concept.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,8 +5,8 @@ author: tomvcassidy
5
5
ms.author: tomcassidy
6
6
ms.date: 08/30/2024
7
7
ms.topic: how-to
8
-
ms.service:
9
-
services:
8
+
ms.service: azure-virtual-machines
9
+
ms.subservice: hpc
10
10
---
11
11
12
12
# Proof-of-concept migration overview
@@ -20,8 +20,8 @@ In this article, we guide you through a proof-of-concept migration.
20
20
21
21
You need an Azure subscription to provision cloud resources.
22
22
23
-
## Migrating from on-premises to the cloud: Proof-of-concept (PoC)
23
+
## Migrating from on-premises to the cloud: proof-of-concept (PoC)
24
24
25
25
We recommend starting with a proof-of-concept (PoC) by provisioning a simple cluster in Azure, using Azure CycleCloud as a resource orchestrator, with one well-known scheduler, such as Slurm, PBS, or LSF. This approach allows one to start understanding Azure technology, assess the functionality of user applications, and investigate performance/costs trade-offs in comparison to the on-premises environment.
26
26
27
-
If one is flexible with the job scheduler, or already uses Slurm scheduler, we recommend using Azure CycleCloud Slurm workspace, which is an offering that helps create a CycleCloud based cluster, with Slurm scheduler, and the basic setup for networking and storage options available. Some details on this process are available in the Resource Orchestrator section from this document.
27
+
If one is flexible with the job scheduler, or already uses Slurm scheduler, we recommend using Azure CycleCloud Slurm workspace, which is an offering that helps create a CycleCloud based cluster, with Slurm scheduler, and the basic setup for networking and storage options available. Some details on this process are available in the Resource Orchestrator section from this document.
Mechanism to allow users access cloud environment in a secure way. It's a common practice in production environments to have resources with private IP addresses, and with rules to define how resources should be accessed.
15
15
@@ -18,31 +18,31 @@ This component should:
18
18
- Allow users to access private network hosting the high performance computing (HPC) environment;
19
19
- Refine network security rules such as source and target ports and IP addresses that can access resources.
20
20
21
-
## Define Network Needs
21
+
## Define network needs
22
22
23
-
1.**Estimate cluster size for proper network setup:**
23
+
***Estimate cluster size for proper network setup:**
24
24
- Different subnets have different ranges of IP addresses.
25
25
26
-
2.**Security rules:**
26
+
***Security rules:**
27
27
- Understand how users access the HPC environment and security rules to be in places (for example, ports and IPs open/closed).
28
28
29
-
###Tools and Services
29
+
## Tools and Services
30
30
31
-
1.**Private network access:**
31
+
***Private network access:**
32
32
- In Azure, the two major components to help access private network are Azure Bastion and Azure VPN Gateway.
33
33
34
-
2.**Network rules:**
34
+
***Network rules:**
35
35
- Another key component for network setup is Azure Network security groups, which is used to filter network traffic between Azure resources in an Azure virtual network.
36
36
37
-
3.**DNS:**
37
+
***DNS:**
38
38
- Azure DNS Private Resolver allows query Azure DNS private zones from an on-premises environment and vice versa without deploying VM based DNS servers.
39
39
40
-
###Best Practices for Network in HPC Lift and Shift Architecture
40
+
## Best practices for network in HPC lift and shift architecture
41
41
42
-
1.**Have good understanding on cluster sizes and services to be used:**
42
+
***Have good understanding on cluster sizes and services to be used:**
43
43
- Different cluster sizes require different IP ranges, and proper planning helps avoid major changes in parts of the infrastructure. Also, some services may need exclusive subnets, and having clarity on those subnets is essential.
44
44
45
-
###Example Steps for Setup and Deployment
45
+
## Example steps for setup and deployment
46
46
47
47
Networking is a vast topic itself. In a production level environment, it's good practice to not use public IP addresses. So one could start by testing such functionality by provisioning a VM and using Bastion.
48
48
@@ -59,7 +59,7 @@ For instance
59
59
- Select option "Deploy Bastion"
60
60
- Once the bastion is provisioned, the VM can be access through it.
The critical foundational components required to establish a landing zone in the cloud for an HPC environment are outlined here. The focus is on setting up resource groups, networking, and basic storage, which serve as the backbone of a successful HPC lift-and-shift deployment.
15
15
@@ -25,7 +25,7 @@ When provisioning resources in the cloud, it's important to have an understandin
25
25
26
26
## Storage
27
27
28
-
In any Azure subscription, setting up basic storage is essential for managing data, applications, and resources effectively. While more advanced and HPC-specific storage configurations will be addressed separately, a solid foundation of basic storage is crucial for general resource management and initial deployment needs.
28
+
In any Azure subscription, setting up basic storage is essential for managing data, applications, and resources effectively. While more advanced and HPC-specific storage configurations are addressed separately, a solid foundation of basic storage is crucial for general resource management and initial deployment needs.
29
29
30
30
For details check the description of the following component:
31
31
@@ -41,4 +41,4 @@ Here we describe each component. Each section includes:
41
41
- Best practices for the component in the context of HPC lift & shift
42
42
- An example of a quick start setup
43
43
44
-
The goal of the quick start is to have a sense on how to start using the component. As the HPC cloud deployment matures, one is expected to automate the usage of the component, by using, for instance, Infrastructure as Software tools such as Terraform or Bicep.
44
+
The goal of the quick start is to have a sense on how to start using the component. As the HPC cloud deployment matures, one is expected to automate the usage of the component, by using, for instance, Infrastructure as Software tools such as Terraform or Bicep.
title: "Deployment step 1: Basic infrastructure - Resource group component"
2
+
title: "Deployment step 1: basic infrastructure - resource group component"
3
3
description: Learn about the configuration of resource groups during migration deployment step one.
4
4
author: tomvcassidy
5
5
ms.author: tomcassidy
6
6
ms.date: 08/30/2024
7
7
ms.topic: how-to
8
-
ms.service:
9
-
services:
8
+
ms.service: azure-virtual-machines
9
+
ms.subservice: hpc
10
10
---
11
11
12
-
# Deployment step 1: Basic infrastructure - Resource group component
12
+
# Deployment step 1: basic infrastructure - resource group component
13
13
14
14
Resource groups in Azure serve as containers that hold related resources for an Azure solution. In an HPC environment, organizing resources into appropriate resource groups is essential for effective management, access control, and cost tracking.
15
15
16
-
## Define Resource Group needs
16
+
## Define resource group needs
17
17
18
-
1.**Project-Based Grouping:**
18
+
***Project-based grouping:**
19
19
- Organize resources by project or workload to simplify management and cost tracking.
20
20
21
-
2.**Environment-Based Grouping:**
21
+
***Environment-based grouping:**
22
22
- Separate resources into different groups based on environments (for example, development, testing, production) to apply different policies and controls.
23
23
24
24
### This component should
25
25
26
-
-**Organize Resources:**
26
+
***Organize resources:**
27
27
- Group related HPC resources (for example, VMs, storage accounts, and network components) into resource groups based on project, department, or environment (for example, development, testing, production).
28
28
29
-
-**Simplify Management:**
29
+
***Simplify management:**
30
30
- Use resource groups to apply access controls, manage resource lifecycles, and monitor costs efficiently.
31
31
32
-
###Best Practices for Resource Groups in HPC Lift and Shift Architecture
32
+
## Best practices for resource groups in HPC lift and shift architecture
33
33
34
-
1.**Consistency in Naming Conventions:**
34
+
***Consistency in naming conventions:**
35
35
- Establish and follow consistent naming conventions for resource groups to facilitate easy identification and management.
36
36
37
-
2.**Resource Group Policies:**
37
+
***Resource group policies:**
38
38
- Apply Azure Policy to resource groups to enforce organizational standards and compliance requirements.
39
39
40
-
###Example Steps for Resource Group Setup
40
+
## Example steps for resource group setup
41
41
42
-
1.**Create a Resource Group:**
42
+
1.**Create a resource group:**
43
43
44
44
- Navigate to the Azure portal.
45
45
- Select "Resource groups" and select "Create."
46
46
- Provide a name for the resource group and select a subscription and region.
47
47
- Select "Review + create" and then "Create."
48
48
49
-
2.**Add Resources to the Resource Group:**
49
+
2.**Add resources to the resource group:**
50
50
51
51
- When creating resources (for example, VMs, storage accounts), assign them to the appropriate resource group.
52
52
- Use tags to further organize resources within the group for better cost management and reporting.
53
53
54
-
###Resources
54
+
## Resources
55
55
56
56
- Resource Groups Documentation: [product website](/azure/azure-resource-manager/management/manage-resource-groups-portal)
0 commit comments