Skip to content

Commit fa62900

Browse files
authored
Update hpc-storage-options.md
1 parent 833033c commit fa62900

File tree

1 file changed

+74
-75
lines changed

1 file changed

+74
-75
lines changed
Lines changed: 74 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,25 @@
11
---
2-
title: "High-Performance Computing (HPC) storage options"
3-
description: Learn about evaluating the most suitable Azure HPC storage solution for HPC.
4-
author: padmalathas
2+
title: "High-Performance Computing (HPC) workload best practices and storage options"
3+
description: A comprehensive guide to choosing a storage solution best suited to your HPC workloads.
4+
author: christinechen2
55
ms.author: padmalathas
6-
ms.date: 06/25//2025
7-
ms.topic: reference-article
6+
ms.reviewer: normesta
7+
ms.date: 06/25/2025
88
ms.service: azure-virtual-machines
99
ms.subservice: hpc
10+
ms.topic: concept-article
1011
# Customer intent: "As a Cloud architect, HPC administrator, I want to evaluate and select the most suitable Azure HPC storage solution based on performance, scalability, protocol support, and workload alignment for AI, HPC, and data-intensive applications."
1112
---
1213

13-
# Overview
14-
This reference article provides a detailed comparison and technical specifications of Azure’s High Performance Computing (HPC) storage solutions. It includes performance metrics, protocol support, cost tiers, and use case alignment for each storage type.
14+
# High-performance computing (HPC) workload best practices and storage options guide
1515

16-
---
16+
<!-- [!INCLUDE[appliesto-sqlvm](../../includes/appliesto-sqlvm.md)] -->
17+
18+
This guide provides best practices, guidelines, a detailed comparison and technical specifications of storage solutions that is best suited to your HPC workload on Azure VMs. It includes performance metrics, protocol support, cost tiers, and use case alignment for each storage type. There's typically a trade-off between optimizing for costs and optimizing for performance. If your workload is less demanding, you might not require every recommended optimization. Consider your performance needs, costs, and workload patterns as you evaluate these recommendations.
19+
20+
## Overview
21+
22+
Storage for HPC workloads consists of core storage and in some cases, an accelerator. Core storage acts as the permanent home for your data. It contains rich data management features and is durable, available, scalable, elastic, and secure. An accelerator enhances core storage by providing high-performance data access. An accelerator can be provisioned on demand and gives your computational workload much faster access to data.
1723

1824
## Storage Services Comparison
1925

@@ -24,90 +30,74 @@ This reference article provides a detailed comparison and technical specificatio
2430
| **IOPS** | 20,000 | 20,000 | 100,000 | 800,000 | >100,000 |
2531
| **Latency** | <100 ms | <10 ms | 2–4 ms | <1 ms | <2 ms |
2632
| **Protocols** | REST, HDFS, NFSv3, SFTP, FUSE, CSI | Same | REST, NFSv4.1, SMB3, CSI | NFSv3/4.1, SMB3, CSI | Lustre, CSI |
27-
| **Cost Tier** | $ | $$ | $$ | $$$ | $$$$ |
2833

29-
---
34+
## Initial consideration
3035

31-
## Specialized Storage Solutions
36+
If you are starting from scratch, see [Understand data store models](/azure/architecture/guide/technology-choices/data-store-overview) to choose a data store and [Choose an Azure storage service](/azure/architecture/guide/technology-choices/storage-options) or [Introduction to Azure Storage](/azure/storage/common/storage-introduction) to get an idea of your storage service options.
3237

33-
Azure offers a range of storage services tailored to meet the demanding needs of HPC workloads. Each solution is optimized for different performance characteristics, access patterns, and cost profiles. Following is an overview of the most relevant storage options and what they are best suited for in HPC scenarios.
38+
## At a glance
3439

40+
Start with the amount of data that you plan to store. Then, consider the number of CPU cores used by your workload and the size of your files. These factors help you to narrow down which core storage service best suits your workload and whether to use an accelerator to enhance performance.
3541

36-
### Azure Blob Storage
37-
38-
Azure Blob Storage is a massively scalable object storage service designed for unstructured data. It supports high-throughput workloads and is ideal for storing large volumes of data such as logs, images, videos, and checkpoint files. Blob storage meets the demanding, high-throughput requirements of HPC applications while providing the scale necessary to support storage for billions of data points flowing in from IoT endpoints.
39-
40-
- Durable, available
41-
* Sixteen nines of designed durability. Choice of durability (LRS, ZRS, GRS, RA-GRS).
42-
* Geo-replication and flexibility to scale as needed.
43-
* Built-in data integrity protection (for example, bit rot).
44-
- Scalable, performant
45-
* In 10 seconds:
46-
- Processes > 820M transactions
47-
- Read/Write > 250 TB of data
48-
- Adds > 15M new objects
49-
* Allows flexible scale up as needed. 
50-
* Meets demanding, high-throughput requirements.
51-
* Stores petabytes of data, cost-effectively with multiple storage tiers.
52-
- Secure, compliant
53-
* Authentication with Microsoft Entra ID.
54-
* Flexible auth including, role-based access control (RBAC), and ACLs.
55-
* Encryption at rest.
56-
* Advanced threat protection.
57-
- Fully managed
58-
* End-to-end lifecycle management.
59-
* Policy-based access control.
60-
* Immutable (WROM) storage.
61-
62-
### Azure Files
63-
- Fully managed file shares with SMB/NFS support.
64-
- Two SKUs: Standard (general purpose) and Premium (low latency, high IOPS).
65-
- Hybrid access via Azure File Sync.
66-
- Use cases: DevOps, backups, remote work, enterprise apps.
67-
68-
### Azure NetApp Files
69-
- Enterprise-grade file storage with ONTAP technology.
70-
- Tiers: Standard, Premium, Ultra.
71-
- Dynamic performance scaling.
72-
- Ideal for databases, VDI, HPC, and containerized apps.
73-
74-
### Azure Managed Lustre
75-
- Parallel file system optimized for HPC and AI.
76-
- Up to 512 GB/s throughput.
77-
- Seamless integration with Azure Blob for tiered storage.
78-
- Best for large-scale simulations, genomics, and scientific workloads.
42+
|Configuration |CPU cores |Sizes of files |Core Storage Recommendation |Accelerator Recommendation |
43+
|---------|---------|---------|---------|---------|
44+
|Under 50 TiB |N/A |N/A | [Azure Files](/azure/storage/files/) or [Azure NetApp Files](/azure/azure-netapp-files/). |No accelerator |
45+
|50 TiB - 5,000 TiB |Less than 500 |N/A|[Azure Files](/azure/storage/files/) or [Azure NetApp Files](/azure/azure-netapp-files/). |No accelerator |
46+
|50 TiB - 5,000 TiB |Over 500 |1 MiB and larger| [Azure Standard Blob](/azure/storage/blobs/). It’s supported by all accelerators, supports many protocols, and is cost-effective. | [Azure Managed Lustre](/azure/azure-managed-lustre/). |
47+
|50 TiB - 5,000 TiB |Over 500 |Smaller than 1 MiB| [Azure Premium Blob](/azure/storage/blobs/storage-blob-block-blob-premium) or [Azure Standard Blob](/azure/storage/blobs/). | [Azure Managed Lustre](/azure/azure-managed-lustre/). |
48+
|50 TiB - 5,000 TiB |Over 500 |Smaller than 512 KiB| [Azure NetApp Files](/azure/azure-netapp-files/). |No accelerator |
49+
|Over 5,000 TiB |N/A |N/A| |Talk to your field or account team. |
50+
<!---| |[Use ZRS disks when sharing disks between VMs](#use-zrs-disks-when-sharing-disks-between-vms). |Prevents a shared disk from becoming a single point of failure. | --->
7951

8052
---
8153

82-
## AI and RAG Workload Storage Requirements
54+
## Solution details
55+
56+
If you are still stuck between options after using the decision trees, here are more details for each solution:
57+
58+
|Solution |Optimal Performance & Scale |Data Access (Access Protocol) |Billing Model |Core Storage or Accelerator |
59+
|---|---|---|---|---|
60+
| [**Azure Standard Blob**](/azure/storage/blobs/) | * Good for large file, bandwidth-intensive workloads.<br> * Designed for unstructured data. <br> * Supports high-throughput workloads. | * Good for traditional (file) and cloud-native (REST) HPC apps. <br>* Easy to access, share, manage datasets.<br> * Works with all accelerators. | Pay for what you use. | Core Storage. |
61+
| [**Azure Premium Blob**](/azure/storage/blobs/storage-blob-block-blob-premium) | * IOPS and latency better than Standard Blob. <br> * Good for datasets with many medium-sized files and mixed file sizes. | Good for traditional (file) and cloud-native (REST) HPC apps. <br> Easy to access, share, manage datasets. <br> Works with all accelerators.| Pay for what you use. | Core Storage. |
62+
| [**Azure Premium Files**](/azure/storage/files/) | * Capacity and bandwidth suited for smaller scale (<1k cores). <br> * IOPS and latency good for medium sized files (>512 KiB). <br> * Offers premium (low latency, high IOPS) SKUs. <br> * Hybrid access via Azure File Sync. | Easy integration with Linux (NFS) and Windows (SMB), but can't use both NFS+SMB to access the same data. | Pay for what you provision. | Core Storage. |
63+
| [**Azure NetApp Files**](/azure/azure-netapp-files/) | * Capacity and bandwidth good for midrange jobs (1k-10k cores). <br> * IOPS and latency good for small-file datasets (<512 KiB). <br> * Excellent for small, many-file workloads. <br> * Enterprise-grade file storage with ONTAP technology. <br> * Dynamic performance scaling across Standard, Premium, Ultra tiers. | Easy to integrate for Linux and Windows, supports multiprotocol for workflows using both Linux + Windows. | Pay what you provision. | Either. |
64+
| [**Azure Managed Lustre**](/azure/azure-managed-lustre/) | Bandwidth to support all job sizes (1k - >10k cores). <br> * IOPS and latency good for thousands of medium-sized files (>512 KiB). <br> * Best for bandwidth-intensive read and write workloads. <br> * Parallel file system optimized for HPC/AI.<br> * Seamless integration with Azure Blob for tiered storage. | Lustre, CSI. | Pay for what you provision. | Durable enough to run as standalone (core) storage, most cost-effective as an accelerator. |
8365

84-
| Stage | Requirements |
85-
|-------------|------------------------------------------------------------------------------|
86-
| Training | High throughput, checkpointing, local caching, large model loading |
87-
| Inference | Fast model access, low latency, concurrent GPU access |
88-
| RAG | Secure unstructured storage, vector DB integration, freshness, low latency |
66+
---
67+
68+
## Specialized Storage Solutions
69+
Azure offers a range of storage services tailored to meet the demanding needs of HPC workloads. Each solution is optimized for different performance characteristics, access patterns, and cost profiles. Following is an overview of the most relevant storage options and what they are best suited for in HPC scenarios.
70+
71+
| Storage Solution | Use Cases | Performance Benchmarks | Scalability Options | Integration with Other Azure Services |
72+
|------|------|-----|-----|-----|
73+
| Azure Blob Storage | * Data Analytics <br> * Content Distribution <br> * Backup and Archival | Throughput up to 30GB/s with BlobFuse2 | * Storage Accounts up to 5 PiB per account <br> * Unlimited number of containers per account | * Azure AI <br>* AKS <br> * Azure Data Lake |
74+
||||||
75+
| Azure Files | * DevOps <br> * Backups <br> * Remote Work | Encryption in Transit (TLS 1.3 for NFS shares) | * File Shares up to 100 TiB per share (Standard) <br> * IOPS up to 100,000 (Premium) | * Azure Backup <br> * Azure Monitor <br> * Microsoft Entra ID |
76+
||||||
77+
| Azure NetApp Files | * Databases <br> * VDI <br> * HPC | IOPS and Throughput measured using FIO | * Capacity Pools up to 100 TiB per pool <br> * Volumes up to 100 TiB per volume | * AKS <br> * Azure Backup <br> * Azure Monitor |
78+
||||||
79+
| Azure Managed Lustre | * Large-scale simulations <br> * Genomics <br> * Scientific Workloads | Throughput up to 30GB/s with the 250MB/s/TiB performance tier | * File Systems up to 1.5 PB capacity<br> * Throughput up to 375 GB/s | * Azure Blob Storage <br> * AKS <br> * Azure Monitor  |
80+
||||||
8981

9082
---
9183

92-
## Blobfuse2 – Mounting Blob Storage
93-
- Virtual File System driver for mounting Blob storage.
94-
- Supports file caching and streaming with block-cache.
95-
- High throughput, secure, open-source.
96-
- Ideal for AI training and fine-tuning scenarios.
84+
## AI and RAG Workload Storage Requirements
85+
86+
The storage requirements for AI and RAG workloads vary across different stages. During the training stage, it is essential to have high throughput, checkpointing, local caching, and the ability to load large models. For the inference stage, fast model access, low latency, and concurrent GPU access are required. In the RAG stage, secure unstructured storage, vector database integration, freshness, and low latency are necessary.
9787

9888
---
9989

10090
## Partner Solutions
10191

102-
| Partner | Protocols | Scale | Unique Features |
103-
|----------------|---------------------|---------------|------------------------------------------------------|
104-
| Qumulo | NFS, SMB, S3 | 200 PiB | Azure-native SaaS, global namespace, cost-effective |
105-
| Dell APEX | NFS, SMB, S3, HDFS | 5.6 PiB | On-prem parity, policy-based tiering |
106-
| Nasuni | NFS, SMB, S3 || File locking, blob as primary tier |
107-
| Hammerspace | NFS, SMB, S3, pNFS || Global namespace, caching alternative |
108-
| Weka | NFS, SMB, S3 | 14 EB | High IOPS, low latency, linear scale-out |
109-
| IBM SpectrumScale | GPFS, NFS, SMB || Full GPFS stack |
110-
| DDN Exascaler | Lustre, NFS, SMB | Petabytes | Full DDN Lustre stack |
92+
| Partner | Protocols | Scale | Unique Features |
93+
|-------------------|---------------------|---------------|------------------------------------------------------|
94+
| Qumulo | NFS, SMB, S3 | 200 PiB | Azure-native SaaS, global namespace, cost-effective |
95+
| Dell APEX | NFS, SMB, S3, HDFS | 5.6 PiB | On-prem parity, policy-based tiering |
96+
| Nasuni | NFS, SMB, S3 || File locking, blob as primary tier |
97+
| Hammerspace | NFS, SMB, S3, pNFS || Global namespace, caching alternative |
98+
| Weka | NFS, SMB, S3 | 14 EB | High IOPS, low latency, linear scale-out |
99+
| IBM SpectrumScale | GPFS, NFS, SMB || Full GPFS stack |
100+
| DDN Exascaler | Lustre, NFS, SMB | Petabytes | Full DDN Lustre stack |
111101

112102
---
113103

@@ -116,3 +106,12 @@ Azure Blob Storage is a massively scalable object storage service designed for u
116106
- Use Availability Zones to control latency.
117107
- Use large volume features in ANF for max bandwidth.
118108
- Consider caching and tiering strategies for cost efficiency.
109+
110+
## Core storage price comparison
111+
112+
In order of most to least expensive, the core storage option prices are:
113+
- Azure NetApp Files
114+
- Azure Premium Blob and Azure Premium Files
115+
- Azure Standard Blob
116+
117+
For more info on the pricing, see [Azure product pricing](https://azure.microsoft.com/pricing/#product-pricing).

0 commit comments

Comments
 (0)