Skip to content

Commit 731a477

Browse files
committed
addressing PR review feedback
1 parent 12befd0 commit 731a477

20 files changed

+438
-477
lines changed

articles/high-performance-computing/lift-and-shift-overview.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,11 @@ author: tomvcassidy
55
ms.author: tomcassidy
66
ms.date: 08/30/2024
77
ms.topic: how-to
8-
ms.service:
9-
services:
8+
ms.service: azure-virtual-machines
9+
ms.subservice: hpc
1010
---
1111

12-
# End-to-end HPC lift and shift architecture
12+
# End-to-end HPC lift and shift architecture overview
1313

1414
"Lift and shift" in the context of High-Performance Computing (HPC) mostly refers to the process of migrating an on-premises environment and workload to the cloud. Ideally, modifications are kept to a minimum (for example, applications, job schedulers, and their configurations should remain mostly the same). Adjustments on storage and hardware are natural to happen because resources are different from on-premises to cloud platforms. With the lift and shift approach, organizations can start benefiting from the cloud more quickly.
1515

@@ -28,44 +28,44 @@ This document therefore:
2828
Before jumping into the architecture description, it's relevant to understand
2929
the different personas in this context, their needs, and expectations.
3030

31-
## Personas and User Experience
31+
## Personas and user experience
3232

3333
There are different people who need to access the HPC environment. Their activities and how they interact with the environment vary quite a bit.
3434

3535
### End-user (engineer / scientist / researcher)
3636

3737
This persona represents the subject matter expert (for example, biologist, physicist, engineer, etc.) who wants to run experiments (that is, submit jobs) and analyze results. End-users interact with system administrators to fine-tune the computing environment whenever needed. They may have some experience using CLI-based tools, but some of them may rely only on web portals or graphical user interfaces via VDI to submit their jobs and interact with the generated results.
3838

39-
**New Responsibilities in Cloud HPC Environment:**
39+
**New responsibilities in cloud HPC environment:**
4040

4141
- End-user shouldn't have any new responsibilities based on the work from both the HPC Administrator and Cloud Administrator. Depending on the on-premises environment, end-users have access to a larger capacity and variety of computing resources to become more productive.
4242

43-
### HPC Administrator
43+
### HPC administrator
4444

4545
This persona represents the one who has HPC expertise and is responsible for deploying the initial computing infrastructure and adapting it according to business and end-user needs. This persona is also responsible for verifying the health of the system and performing troubleshooting. HPC administrators are comfortable accessing the architecture and its components via CLI, SDKs, and web portals. They're also the first point of contact when end-users face any challenge with the computing environment.
4646

47-
**New Responsibilities in Cloud HPC Environment:**
47+
**New responsibilities in cloud HPC environment:**
4848

4949
- Managing cloud resources and services (for example, virtual machines, storage, networking) via cloud management platforms.
5050
- Implementing and managing clusters and resources via new resource orchestration tools (for example, CycleCloud).
5151
- Optimizing application deployment by understanding infrastructure details (that is, VM types, storage, and network options).
5252
- Optimizing resource utilization and costs by using cloud-specific features such as autoscaling and spot instances.
5353

54-
### Cloud Administrator
54+
### Cloud administrator
5555

5656
This persona works with the HPC administrator to help deploy and maintain the computing infrastructure. This persona isn't (necessarily) an HPC expert, but a Cloud expert with deep knowledge of the overall company IT infrastructure, including network configurations/policies, user access rights, and user devices. Depending on the case, the HPC administrator and Cloud administrator may be the same person.
5757

58-
**New Responsibilities in Cloud HPC Environment:**
58+
**New responsibilities in cloud HPC environment:**
5959

6060
- Collaborating with HPC administrators to ensure seamless integration of HPC workloads with cloud infrastructure.
6161
- Monitoring and managing cloud infrastructure performance, security, and compliance.
6262
- Helping with the configuration of cloud-based networking and storage solutions to support HPC workloads.
6363

64-
### Business Manager / Owner
64+
### Business manager / owner
6565

6666
This persona represents the one who is responsible for the business, which includes taking care of budget and projects to meet organizational goals. For this persona, the accounting component of the architecture is relevant to understand costs for each project. This persona works with HPC admins and end-users to understand platform needs, including storage, network, computing resources. They also plan for future workloads.
6767

68-
**New Responsibilities in Cloud HPC Environment:**
68+
**New responsibilities in cloud HPC environment:**
6969

7070
- Analyzing detailed cost reports and usage metrics provided by cloud service providers to manage budgets and forecast expenses.
7171
- Making strategic decisions based on cloud resource usage and cost optimization opportunities.
@@ -81,8 +81,8 @@ There are also extensions that could be in place, such as sign-in nodes, data mo
8181

8282
This production-level environment may have various components to be set up. Therefore, environment deployers and managers become key to automate its initial deployment and upgrade it along the way, respectively. More advanced installations can also have environment templates (or specifications) with software versions and configurations that are more optimal and tested properly. Once the environment is in production with all the required components in place, over time, adjustments may be required to meet user demands, including changes in VM types or storage options/capabilities.
8383

84-
## Instantiating the lift and shift HPC Cloud architecture
84+
## Instantiating the lift and shift HPC cloud architecture
8585

8686
Here we provide more details for each architecture component, including pointers to official Azure products, tech blogs with some best practices, git repositories, and links to non-product solutions.
8787

88-
**Quick start.** For a quick start solution to create an HPC environment in the cloud with basic building blocks, we recommend using [Azure CycleCloud Slurm workspace](https://techcommunity.microsoft.com/t5/azure-high-performance-computing/introducing-azure-cyclecloud-slurm-workspace-preview/ba-p/4158433).
88+
**Quick start.** For a quick start solution to create an HPC environment in the cloud with basic building blocks, we recommend using [Azure CycleCloud Slurm workspace](https://techcommunity.microsoft.com/t5/azure-high-performance-computing/introducing-azure-cyclecloud-slurm-workspace-preview/ba-p/4158433).

articles/high-performance-computing/lift-and-shift-production-level-overview.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ author: tomvcassidy
55
ms.author: tomcassidy
66
ms.date: 08/30/2024
77
ms.topic: how-to
8-
ms.service:
9-
services:
8+
ms.service: azure-virtual-machines
9+
ms.subservice: hpc
1010
---
1111

1212
# Production-level environment migration guide overview
@@ -20,7 +20,7 @@ In this article and the following articles, we guide you through a product-level
2020

2121
You need an Azure subscription to provision cloud resources.
2222

23-
## Migrating from on-premises to the cloud: Production level
23+
## Migrating from on-premises to the cloud: production level
2424

2525
After the proof-of-concept phase, planning is required to get ready for creating a production-level HPC environment. This new environment can represent part of the on-premises infrastructure (for example, an HPC cluster from a group of clusters or queue/partition from an existing cluster), or the entire computing capability.
2626

@@ -34,7 +34,6 @@ Due to component dependencies, the deployment of this HPC cloud environment is b
3434
1. Compute nodes' specifications;
3535
1. End user entry point.
3636

37-
In the following articles, we cover each deployment step and the components involved. In the descriptions of the components, we highlight their relevant dependencies in more detail. It is also worth noting that the component deployment steps can be executed in several ways. We provide a few tips to help get started with the deployment components via the Azure portal. But at a production level, we recommend the creation of an environment deployer that leverages Infrastructure-as-code (e.g. via bicep, Terraform, or Azure CLI). By doing so, one can create an environment in an automated and replicable fashion.
37+
In the following articles, we cover each deployment step and the components involved. In the descriptions of the components, we highlight their relevant dependencies in more detail. It's also worth noting that the component deployment steps can be executed in several ways. We provide a few tips to help get started with the deployment components via the Azure portal. But at a production level, we recommend the creation of an environment deployer that leverages infrastructure-as-code (for example, via bicep, Terraform, or Azure CLI). By doing so, one can create an environment in an automated and replicable fashion.
3838

3939
For each step, certain topics need to be assessed before starting the migration process.
40-

articles/high-performance-computing/lift-and-shift-proof-of-concept.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ author: tomvcassidy
55
ms.author: tomcassidy
66
ms.date: 08/30/2024
77
ms.topic: how-to
8-
ms.service:
9-
services:
8+
ms.service: azure-virtual-machines
9+
ms.subservice: hpc
1010
---
1111

1212
# Proof-of-concept migration overview
@@ -20,8 +20,8 @@ In this article, we guide you through a proof-of-concept migration.
2020

2121
You need an Azure subscription to provision cloud resources.
2222

23-
## Migrating from on-premises to the cloud: Proof-of-concept (PoC)
23+
## Migrating from on-premises to the cloud: proof-of-concept (PoC)
2424

2525
We recommend starting with a proof-of-concept (PoC) by provisioning a simple cluster in Azure, using Azure CycleCloud as a resource orchestrator, with one well-known scheduler, such as Slurm, PBS, or LSF. This approach allows one to start understanding Azure technology, assess the functionality of user applications, and investigate performance/costs trade-offs in comparison to the on-premises environment.
2626

27-
If one is flexible with the job scheduler, or already uses Slurm scheduler, we recommend using Azure CycleCloud Slurm workspace, which is an offering that helps create a CycleCloud based cluster, with Slurm scheduler, and the basic setup for networking and storage options available. Some details on this process are available in the Resource Orchestrator section from this document.
27+
If one is flexible with the job scheduler, or already uses Slurm scheduler, we recommend using Azure CycleCloud Slurm workspace, which is an offering that helps create a CycleCloud based cluster, with Slurm scheduler, and the basic setup for networking and storage options available. Some details on this process are available in the Resource Orchestrator section from this document.

articles/high-performance-computing/lift-and-shift-step-1-networking.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
---
2-
title: "Deployment step 1: Landing zone - Network access component"
2+
title: "Deployment step 1: basic infrastructure - network access component"
33
description: Learn about the configuration of network access during migration deployment step one.
44
author: tomvcassidy
55
ms.author: tomcassidy
66
ms.date: 08/30/2024
77
ms.topic: how-to
8-
ms.service:
9-
services:
8+
ms.service: azure-virtual-machines
9+
ms.subservice: hpc
1010
---
1111

12-
# Deployment step 1: Landing zone - Network access component
12+
# Deployment step 1: basic infrastructure - network access component
1313

1414
Mechanism to allow users access cloud environment in a secure way. It's a common practice in production environments to have resources with private IP addresses, and with rules to define how resources should be accessed.
1515

@@ -18,31 +18,31 @@ This component should:
1818
- Allow users to access private network hosting the high performance computing (HPC) environment;
1919
- Refine network security rules such as source and target ports and IP addresses that can access resources.
2020

21-
## Define Network Needs
21+
## Define network needs
2222

23-
1. **Estimate cluster size for proper network setup:**
23+
* **Estimate cluster size for proper network setup:**
2424
- Different subnets have different ranges of IP addresses.
2525

26-
2. **Security rules:**
26+
* **Security rules:**
2727
- Understand how users access the HPC environment and security rules to be in places (for example, ports and IPs open/closed).
2828

29-
### Tools and Services
29+
## Tools and Services
3030

31-
1. **Private network access:**
31+
* **Private network access:**
3232
- In Azure, the two major components to help access private network are Azure Bastion and Azure VPN Gateway.
3333

34-
2. **Network rules:**
34+
* **Network rules:**
3535
- Another key component for network setup is Azure Network security groups, which is used to filter network traffic between Azure resources in an Azure virtual network.
3636

37-
3. **DNS:**
37+
* **DNS:**
3838
- Azure DNS Private Resolver allows query Azure DNS private zones from an on-premises environment and vice versa without deploying VM based DNS servers.
3939

40-
### Best Practices for Network in HPC Lift and Shift Architecture
40+
## Best practices for network in HPC lift and shift architecture
4141

42-
1. **Have good understanding on cluster sizes and services to be used:**
42+
* **Have good understanding on cluster sizes and services to be used:**
4343
- Different cluster sizes require different IP ranges, and proper planning helps avoid major changes in parts of the infrastructure. Also, some services may need exclusive subnets, and having clarity on those subnets is essential.
4444

45-
### Example Steps for Setup and Deployment
45+
## Example steps for setup and deployment
4646

4747
Networking is a vast topic itself. In a production level environment, it's good practice to not use public IP addresses. So one could start by testing such functionality by provisioning a VM and using Bastion.
4848

@@ -59,7 +59,7 @@ For instance
5959
- Select option "Deploy Bastion"
6060
- Once the bastion is provisioned, the VM can be access through it.
6161

62-
#### Resources
62+
## Resources
6363

6464
- VPN Gateway documentation: [product website](/azure/vpn-gateway/)
6565
- Azure Bastion documentation: [product website](/azure/bastion/)

articles/high-performance-computing/lift-and-shift-step-1-overview.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
---
2-
title: "Deployment step 1: Basic infrastructure - Overview"
2+
title: "Deployment step 1: basic infrastructure - overview"
33
description: Learn about production-level environment migration deployment step one.
44
author: tomvcassidy
55
ms.author: tomcassidy
66
ms.date: 08/30/2024
77
ms.topic: how-to
8-
ms.service:
9-
services:
8+
ms.service: azure-virtual-machines
9+
ms.subservice: hpc
1010
---
1111

12-
# Deployment step 1: Basic infrastructure - Overview
12+
# Deployment step 1: basic infrastructure - overview
1313

1414
The critical foundational components required to establish a landing zone in the cloud for an HPC environment are outlined here. The focus is on setting up resource groups, networking, and basic storage, which serve as the backbone of a successful HPC lift-and-shift deployment.
1515

@@ -25,7 +25,7 @@ When provisioning resources in the cloud, it's important to have an understandin
2525

2626
## Storage
2727

28-
In any Azure subscription, setting up basic storage is essential for managing data, applications, and resources effectively. While more advanced and HPC-specific storage configurations will be addressed separately, a solid foundation of basic storage is crucial for general resource management and initial deployment needs.
28+
In any Azure subscription, setting up basic storage is essential for managing data, applications, and resources effectively. While more advanced and HPC-specific storage configurations are addressed separately, a solid foundation of basic storage is crucial for general resource management and initial deployment needs.
2929

3030
For details check the description of the following component:
3131

@@ -41,4 +41,4 @@ Here we describe each component. Each section includes:
4141
- Best practices for the component in the context of HPC lift & shift
4242
- An example of a quick start setup
4343

44-
The goal of the quick start is to have a sense on how to start using the component. As the HPC cloud deployment matures, one is expected to automate the usage of the component, by using, for instance, Infrastructure as Software tools such as Terraform or Bicep.
44+
The goal of the quick start is to have a sense on how to start using the component. As the HPC cloud deployment matures, one is expected to automate the usage of the component, by using, for instance, Infrastructure as Software tools such as Terraform or Bicep.
Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,58 +1,58 @@
11
---
2-
title: "Deployment step 1: Basic infrastructure - Resource group component"
2+
title: "Deployment step 1: basic infrastructure - resource group component"
33
description: Learn about the configuration of resource groups during migration deployment step one.
44
author: tomvcassidy
55
ms.author: tomcassidy
66
ms.date: 08/30/2024
77
ms.topic: how-to
8-
ms.service:
9-
services:
8+
ms.service: azure-virtual-machines
9+
ms.subservice: hpc
1010
---
1111

12-
# Deployment step 1: Basic infrastructure - Resource group component
12+
# Deployment step 1: basic infrastructure - resource group component
1313

1414
Resource groups in Azure serve as containers that hold related resources for an Azure solution. In an HPC environment, organizing resources into appropriate resource groups is essential for effective management, access control, and cost tracking.
1515

16-
## Define Resource Group needs
16+
## Define resource group needs
1717

18-
1. **Project-Based Grouping:**
18+
* **Project-based grouping:**
1919
- Organize resources by project or workload to simplify management and cost tracking.
2020

21-
2. **Environment-Based Grouping:**
21+
* **Environment-based grouping:**
2222
- Separate resources into different groups based on environments (for example, development, testing, production) to apply different policies and controls.
2323

2424
### This component should
2525

26-
- **Organize Resources:**
26+
* **Organize resources:**
2727
- Group related HPC resources (for example, VMs, storage accounts, and network components) into resource groups based on project, department, or environment (for example, development, testing, production).
2828

29-
- **Simplify Management:**
29+
* **Simplify management:**
3030
- Use resource groups to apply access controls, manage resource lifecycles, and monitor costs efficiently.
3131

32-
### Best Practices for Resource Groups in HPC Lift and Shift Architecture
32+
## Best practices for resource groups in HPC lift and shift architecture
3333

34-
1. **Consistency in Naming Conventions:**
34+
* **Consistency in naming conventions:**
3535
- Establish and follow consistent naming conventions for resource groups to facilitate easy identification and management.
3636

37-
2. **Resource Group Policies:**
37+
* **Resource group policies:**
3838
- Apply Azure Policy to resource groups to enforce organizational standards and compliance requirements.
3939

40-
### Example Steps for Resource Group Setup
40+
## Example steps for resource group setup
4141

42-
1. **Create a Resource Group:**
42+
1. **Create a resource group:**
4343

4444
- Navigate to the Azure portal.
4545
- Select "Resource groups" and select "Create."
4646
- Provide a name for the resource group and select a subscription and region.
4747
- Select "Review + create" and then "Create."
4848

49-
2. **Add Resources to the Resource Group:**
49+
2. **Add resources to the resource group:**
5050

5151
- When creating resources (for example, VMs, storage accounts), assign them to the appropriate resource group.
5252
- Use tags to further organize resources within the group for better cost management and reporting.
5353

54-
### Resources
54+
## Resources
5555

5656
- Resource Groups Documentation: [product website](/azure/azure-resource-manager/management/manage-resource-groups-portal)
5757
- Azure Policy Documentation: [product website](/azure/governance/policy/overview)
58-
- Azure Tags Documentation: [product website](/azure/azure-resource-manager/management/tag-resources)
58+
- Azure Tags Documentation: [product website](/azure/azure-resource-manager/management/tag-resources)

0 commit comments

Comments
 (0)