You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: learn-pr/azure/intro-to-hpc/includes/3-how-azure-hpc-works.md
+11-13Lines changed: 11 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,21 +36,19 @@ Run visualization workloads with HPC and Azure Virtual Machines that boost produ
36
36
37
37
## Mapping Azure VM and Storage products to components in an HPC system
38
38
39
-
### Azure Compute Virtual Machine Solutions
40
-
41
-
#### VMs with low latency (HPC SKUs)
39
+
### VMs with low latency (HPC SKUs)
42
40
43
41
The following H-series and N-series VMs are RDMA capable and can communicate over the low latency and high bandwidth InfiniBand network. The RDMA capability over such an interconnect is critical to boost the scalability and performance of distributed-node HPC and AI workloads.
| **HB-series** VMs are optimized for applications that are memory intensive, such as fluid dynamics, explicit finite element analysis, and weather modeling. <br><br>**HC-series** VMs are optimized for applications that are compute intensive, such as molecular dynamics, implicit finite element analysis, and computational chemistry. <br><br><br>| **NC-series** VMs are powered by the NVIDIA Tesla K80 card and the Intel Xeon E5-2690 v3 (Haswell) processor. Users can crunch through data faster by using CUDA for energy exploration applications, crash simulations, ray traced rendering, deep learning, and more. <br><br> **ND-series** VMs are a new addition to the GPU family designed for AI and deep learning workloads. It offers configuration with a secondary low-latency, high-throughput network through RDMA, and InfiniBand connectivity enables running of large-scale training jobs spanning many GPUs.| **NV-series** VMs are made for desktop accelerated applications and virtual desktops where customers are able to visualize their data or simulations. Enables users to visualize their graphics intensive workflows on the NV instances to get a superior graphics capability and additionally run single precision workloads such as encoding and rendering.<br><br><br><br><br><br><br>|
50
48
51
-
###Azure Storage Solutions
49
+
## Azure Storage Solutions
52
50
53
-
####Azure Blob Storage
51
+
### Azure Blob Storage
54
52
55
53
Allows massively scalable and secure object storage for cloud-native workloads, archives, data lakes, high-performance computing, and machine learning. It's scalable and optimized for data lakes with comprehensive data management.
56
54
@@ -63,7 +61,7 @@ Key design features include:
63
61
- Storing data for backup and restore, disaster recovery, and archiving.
64
62
- Storing data for analysis by an on-premises or Azure-hosted service.
65
63
66
-
####Azure NetApp Files
64
+
### Azure NetApp Files
67
65
68
66
Makes it easy for enterprise line-of-business and storage professionals to migrate and run complex, file-based applications with no code change. It's used as the underlying shared file-storage service in various scenarios such as, lift-shift migration of POSIX compliant Linux and Windows applications, SAP HANA, databases, and enterprise web applications.
69
67
@@ -75,7 +73,7 @@ Key benefits include:
75
73
- Data protection using Cross-Region replication.
76
74
- Advanced Enterprise Data Management features.
77
75
78
-
####Azure Files
76
+
### Azure Files
79
77
80
78
Azure Files offers fully managed file shares in the cloud that are accessible via the industry standard Server Message Block (SMB) protocol or Network File System (NFS) protocol.
81
79
@@ -101,7 +99,7 @@ Key benefits include:
101
99
- Resiliency
102
100
- Familiar Programmability
103
101
104
-
####Azure Managed Lustre
102
+
### Azure Managed Lustre
105
103
106
104
Azure Managed Lustre service gives you the capability to quickly create an Azure-based Lustre file system for cloud-based high-performance computing jobs. It's a fully managed parallel file system best suited for medium to large HPC workloads. It enables HPC applications in the cloud without breaking application compatibility by providing familiar Lustre parallel file system functionality, behaviors, and performance, securing long-term application investments.
107
105
@@ -116,9 +114,9 @@ Key benefits include:
116
114
- Supports containerized workloads with AKS.
117
115
- Integrates with Azure Blob Storage as a source for importing and exporting data for long-term storage.
118
116
119
-
####VM-based file systems
117
+
### VM-based file systems
120
118
121
-
#####Single VM NAS
119
+
#### Single VM NAS
122
120
123
121
Cloud-based Network Attached Storage (NAS) helps you address storage needs in the cloud using the same constructs as an on-premises NAS system. It gives organizations storage that's as performant as their on-premises NAS with the added ability to scale in the cloud-and all without having to make major changes to their existing application interfaces and processes.
124
122
@@ -129,7 +127,7 @@ Key benefits include:
129
127
- Network devices accessing Virtual NAS storage can continue to do so using the same protocols without any reconfiguration.
130
128
- Capacity management is also easier since any required storage can be allocated from the underlying virtualization layer.
131
129
132
-
#####Multi-node Parallel file systems
130
+
#### Multi-node Parallel file systems
133
131
134
132
Parallel file systems distribute block level storage across multiple networked storage nodes. File data is spread among these nodes, meaning file data is spread among multiple storage devices. It pools any individual storage I/O requests across multiple storage nodes that are accessible through a common namespace.
135
133
@@ -142,7 +140,7 @@ The advantages of distributed storage and superior I/O performance make parallel
142
140
143
141

144
142
145
-
#####Cray ClusterStor
143
+
#### Cray ClusterStor
146
144
147
145
The Cray ClusterStor in Azure storage system is a high capacity and high throughput storage solution to accelerate your HPC simulations. It's a bare metal appliance that is fully integrated in the Azure fabric and accessible by a large selection of other Azure services. Cray ClusterStor in Azure offers a Lustre-based, single-tenant, bare metal, and fully managed HPC environment in Microsoft Azure.
Copy file name to clipboardExpand all lines: learn-pr/azure/intro-to-hpc/includes/4-microsoft-hpc-pack.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
One of the key concepts of cloud computing is *Orchestration*. It refers to overseeing the deployment, running and monitoring of all the components of an application in the cluster.
1
+
One of the key concepts of cloud computing is *Orchestration*. It refers to overseeing the deployment, running, and monitoring of all the components of an application in the cluster.
2
2
3
3
Additionally, an orchestrator can perform other tasks like healing (managing errors), scaling, and logging. Orchestrators like the well-known Kubernetes or Mesos can access cloud cluster resources directly by virtualization.
4
4
@@ -75,20 +75,20 @@ There are two basic strategies that schedulers can use to determine which job to
75
75
76
76
-**Shortest Job First:** Based on the execution time declared in the job script, the scheduler estimates the job execution time. The jobs are ranked in the ascending order of the execution time. While short jobs will start after a short waiting time, long running jobs (or at least jobs declared as such) might never actually start.
77
77
78
-
-**Backfilling:** The scheduler maintains the concept of *First Come, First Serve* without preventing long running jobs from executing. The scheduler runs the job only when the first job in the queue can be executed. If otherwise, the scheduler goes through the rest of the queue to check whether another job can be executed without extending the waiting time of the first job in queue. If it finds such a job, the scheduler runs that job. Small jobs usually encounter short queue times.
78
+
In addition, there is the practice of **Backfilling.** The scheduler maintains the concept of *First Come, First Serve* without preventing long running jobs from executing. The scheduler runs the job only when the first job in the queue can be executed. If otherwise, the scheduler goes through the rest of the queue to check whether another job can be executed without extending the waiting time of the first job in queue. If it finds such a job, the scheduler runs that job. Small jobs usually encounter short queue times.
79
79
80
80
### Workflow management
81
81
82
-
-**Task pipelining:** Repeated operations such as tool usage and software process task sequence executions can be organized into a pipeline. Automating it can make the overall software and tool usage more efficient. It creates efficiencies by making the task itself faster and reducing the burden upon the knowledge worker for its management.
82
+
**Task pipelining:** Repeated operations such as tool usage and software process task sequence executions can be organized into a pipeline. Automating it can make the overall software and tool usage more efficient. It creates efficiencies by making the task itself faster and reducing the burden upon the knowledge worker for its management.
83
83
84
-
-**Task automation:** Automation can reduce the error rate of a process by eliminating variance in how it's performed. Pipelining and automation of a task can open the door for further process innovations like parallelization and cloud deployment.
84
+
**Task automation:** Automation can reduce the error rate of a process by eliminating variance in how it's performed. Pipelining and automation of a task can open the door for further process innovations like parallelization and cloud deployment.
85
85
86
86
### Tools for workflow management
87
87
88
-
-**Azure Batch:** Use Azure Batch to run large-scale, parallel, and high-performance computing (HPC) batch jobs efficiently in Azure. Azure Batch creates and manages a pool of compute nodes (virtual machines), installs the applications you want to run, and schedules jobs to run on the nodes. There's no cluster or job scheduler software to install, manage, or scale. Instead, you use Batch APIs and tools, command-line scripts, or the Azure portal to configure, manage, and monitor your jobs.
88
+
**Azure Batch:** Use Azure Batch to run large-scale, parallel, and high-performance computing (HPC) batch jobs efficiently in Azure. Azure Batch creates and manages a pool of compute nodes (virtual machines), installs the applications you want to run, and schedules jobs to run on the nodes. There's no cluster or job scheduler software to install, manage, or scale. Instead, you use Batch APIs and tools, command-line scripts, or the Azure portal to configure, manage, and monitor your jobs.
89
89
90
90
For full details on Azure Batch, including more capabilities and how it works, see [Azure Batch](/azure/batch).
91
91
92
-
-**Azure CycleCloud:** Azure CycleCloud is an enterprise-friendly tool for orchestrating and managing High Performance Computing (HPC) environments on Azure. With CycleCloud, users can plan infrastructure for HPC systems, deploy familiar HPC schedulers, and automatically scale the infrastructure to run jobs efficiently at any scale. Through CycleCloud, users can create different types of file systems and mount them to the compute cluster nodes to support HPC workloads.
92
+
**Azure CycleCloud:** Azure CycleCloud is an enterprise-friendly tool for orchestrating and managing High Performance Computing (HPC) environments on Azure. With CycleCloud, users can plan infrastructure for HPC systems, deploy familiar HPC schedulers, and automatically scale the infrastructure to run jobs efficiently at any scale. Through CycleCloud, users can create different types of file systems and mount them to the compute cluster nodes to support HPC workloads.
93
93
94
94
For more information on Azure CycleCloud, see [Azure CycleCloud](/azure/cyclecloud).
0 commit comments