You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: Learn about evaluating the most suitable Azure HPC storage solution for HPC.
4
-
author: padmalathas
2
+
title: "High-Performance Computing (HPC) workload best practices and storage options"
3
+
description: A comprehensive guide to choosing a storage solution best suited to your HPC workloads.
4
+
author: christinechen2
5
5
ms.author: padmalathas
6
-
ms.date: 06/25//2025
7
-
ms.topic: reference-article
6
+
ms.reviewer: normesta
7
+
ms.date: 06/25/2025
8
8
ms.service: azure-virtual-machines
9
9
ms.subservice: hpc
10
+
ms.topic: concept-article
10
11
# Customer intent: "As a Cloud architect, HPC administrator, I want to evaluate and select the most suitable Azure HPC storage solution based on performance, scalability, protocol support, and workload alignment for AI, HPC, and data-intensive applications."
11
12
---
12
13
13
-
# Overview
14
-
This reference article provides a detailed comparison and technical specifications of Azure’s High Performance Computing (HPC) storage solutions. It includes performance metrics, protocol support, cost tiers, and use case alignment for each storage type.
14
+
# High-performance computing (HPC) workload best practices and storage options guide
This guide provides best practices, guidelines, a detailed comparison and technical specifications of storage solutions that is best suited to your HPC workload on Azure VMs. It includes performance metrics, protocol support, cost tiers, and use case alignment for each storage type. There's typically a trade-off between optimizing for costs and optimizing for performance. If your workload is less demanding, you might not require every recommended optimization. Consider your performance needs, costs, and workload patterns as you evaluate these recommendations.
19
+
20
+
## Overview
21
+
22
+
Storage for HPC workloads consists of core storage and in some cases, an accelerator. Core storage acts as the permanent home for your data. It contains rich data management features and is durable, available, scalable, elastic, and secure. An accelerator enhances core storage by providing high-performance data access. An accelerator can be provisioned on demand and gives your computational workload much faster access to data.
17
23
18
24
## Storage Services Comparison
19
25
@@ -24,90 +30,74 @@ This reference article provides a detailed comparison and technical specificatio
If you are starting from scratch, see [Understand data store models](/azure/architecture/guide/technology-choices/data-store-overview) to choose a data store and [Choose an Azure storage service](/azure/architecture/guide/technology-choices/storage-options) or [Introduction to Azure Storage](/azure/storage/common/storage-introduction) to get an idea of your storage service options.
32
37
33
-
Azure offers a range of storage services tailored to meet the demanding needs of HPC workloads. Each solution is optimized for different performance characteristics, access patterns, and cost profiles. Following is an overview of the most relevant storage options and what they are best suited for in HPC scenarios.
38
+
## At a glance
34
39
40
+
Start with the amount of data that you plan to store. Then, consider the number of CPU cores used by your workload and the size of your files. These factors help you to narrow down which core storage service best suits your workload and whether to use an accelerator to enhance performance.
35
41
36
-
### Azure Blob Storage
37
-
38
-
Azure Blob Storage is a massively scalable object storage service designed for unstructured data. It supports high-throughput workloads and is ideal for storing large volumes of data such as logs, images, videos, and checkpoint files. Blob storage meets the demanding, high-throughput requirements of HPC applications while providing the scale necessary to support storage for billions of data points flowing in from IoT endpoints.
39
-
40
-
- Durable, available
41
-
* Sixteen nines of designed durability. Choice of durability (LRS, ZRS, GRS, RA-GRS).
42
-
* Geo-replication and flexibility to scale as needed.
43
-
* Built-in data integrity protection (for example, bit rot).
44
-
- Scalable, performant
45
-
* In 10 seconds:
46
-
- Processes > 820M transactions
47
-
- Read/Write > 250 TB of data
48
-
- Adds > 15M new objects
49
-
* Allows flexible scale up as needed.
50
-
* Meets demanding, high-throughput requirements.
51
-
* Stores petabytes of data, cost-effectively with multiple storage tiers.
52
-
- Secure, compliant
53
-
* Authentication with Microsoft Entra ID.
54
-
* Flexible auth including, role-based access control (RBAC), and ACLs.
55
-
* Encryption at rest.
56
-
* Advanced threat protection.
57
-
- Fully managed
58
-
* End-to-end lifecycle management.
59
-
* Policy-based access control.
60
-
* Immutable (WROM) storage.
61
-
62
-
### Azure Files
63
-
- Fully managed file shares with SMB/NFS support.
64
-
- Two SKUs: Standard (general purpose) and Premium (low latency, high IOPS).
65
-
- Hybrid access via Azure File Sync.
66
-
- Use cases: DevOps, backups, remote work, enterprise apps.
67
-
68
-
### Azure NetApp Files
69
-
- Enterprise-grade file storage with ONTAP technology.
70
-
- Tiers: Standard, Premium, Ultra.
71
-
- Dynamic performance scaling.
72
-
- Ideal for databases, VDI, HPC, and containerized apps.
73
-
74
-
### Azure Managed Lustre
75
-
- Parallel file system optimized for HPC and AI.
76
-
- Up to 512 GB/s throughput.
77
-
- Seamless integration with Azure Blob for tiered storage.
78
-
- Best for large-scale simulations, genomics, and scientific workloads.
|50 TiB - 5,000 TiB |Less than 500 |N/A|[Azure Files](/azure/storage/files/) or [Azure NetApp Files](/azure/azure-netapp-files/). |No accelerator |
46
+
|50 TiB - 5,000 TiB |Over 500 |1 MiB and larger|[Azure Standard Blob](/azure/storage/blobs/). It’s supported by all accelerators, supports many protocols, and is cost-effective. |[Azure Managed Lustre](/azure/azure-managed-lustre/). |
47
+
|50 TiB - 5,000 TiB |Over 500 |Smaller than 1 MiB|[Azure Premium Blob](/azure/storage/blobs/storage-blob-block-blob-premium) or [Azure Standard Blob](/azure/storage/blobs/). |[Azure Managed Lustre](/azure/azure-managed-lustre/). |
|Over 5,000 TiB |N/A |N/A||Talk to your field or account team. |
50
+
<!---| |[Use ZRS disks when sharing disks between VMs](#use-zrs-disks-when-sharing-disks-between-vms). |Prevents a shared disk from becoming a single point of failure. | --->
79
51
80
52
---
81
53
82
-
## AI and RAG Workload Storage Requirements
54
+
## Solution details
55
+
56
+
If you are still stuck between options after using the decision trees, here are more details for each solution:
57
+
58
+
|Solution |Optimal Performance & Scale |Data Access (Access Protocol) |Billing Model |Core Storage or Accelerator |
59
+
|---|---|---|---|---|
60
+
|[**Azure Standard Blob**](/azure/storage/blobs/)| * Good for large file, bandwidth-intensive workloads.<br> * Designed for unstructured data. <br> * Supports high-throughput workloads. | * Good for traditional (file) and cloud-native (REST) HPC apps. <br>* Easy to access, share, manage datasets.<br> * Works with all accelerators. | Pay for what you use. | Core Storage. |
61
+
|[**Azure Premium Blob**](/azure/storage/blobs/storage-blob-block-blob-premium)| * IOPS and latency better than Standard Blob. <br> * Good for datasets with many medium-sized files and mixed file sizes. | Good for traditional (file) and cloud-native (REST) HPC apps. <br> Easy to access, share, manage datasets. <br> Works with all accelerators.| Pay for what you use. | Core Storage. |
62
+
|[**Azure Premium Files**](/azure/storage/files/)| * Capacity and bandwidth suited for smaller scale (<1k cores). <br> * IOPS and latency good for medium sized files (>512 KiB). <br> * Offers premium (low latency, high IOPS) SKUs. <br> * Hybrid access via Azure File Sync. | Easy integration with Linux (NFS) and Windows (SMB), but can't use both NFS+SMB to access the same data. | Pay for what you provision. | Core Storage. |
63
+
|[**Azure NetApp Files**](/azure/azure-netapp-files/)| * Capacity and bandwidth good for midrange jobs (1k-10k cores). <br> * IOPS and latency good for small-file datasets (<512 KiB). <br> * Excellent for small, many-file workloads. <br> * Enterprise-grade file storage with ONTAP technology. <br> * Dynamic performance scaling across Standard, Premium, Ultra tiers. | Easy to integrate for Linux and Windows, supports multiprotocol for workflows using both Linux + Windows. | Pay what you provision. | Either. |
64
+
|[**Azure Managed Lustre**](/azure/azure-managed-lustre/)| Bandwidth to support all job sizes (1k - >10k cores). <br> * IOPS and latency good for thousands of medium-sized files (>512 KiB). <br> * Best for bandwidth-intensive read and write workloads. <br> * Parallel file system optimized for HPC/AI.<br> * Seamless integration with Azure Blob for tiered storage. | Lustre, CSI. | Pay for what you provision. | Durable enough to run as standalone (core) storage, most cost-effective as an accelerator. |
Azure offers a range of storage services tailored to meet the demanding needs of HPC workloads. Each solution is optimized for different performance characteristics, access patterns, and cost profiles. Following is an overview of the most relevant storage options and what they are best suited for in HPC scenarios.
70
+
71
+
| Storage Solution | Use Cases | Performance Benchmarks | Scalability Options | Integration with Other Azure Services |
72
+
|------|------|-----|-----|-----|
73
+
| Azure Blob Storage | * Data Analytics <br> * Content Distribution <br> * Backup and Archival | Throughput up to 30GB/s with BlobFuse2 | * Storage Accounts up to 5 PiB per account <br> * Unlimited number of containers per account | * Azure AI <br>* AKS <br> * Azure Data Lake |
74
+
||||||
75
+
| Azure Files | * DevOps <br> * Backups <br> * Remote Work | Encryption in Transit (TLS 1.3 for NFS shares) | * File Shares up to 100 TiB per share (Standard) <br> * IOPS up to 100,000 (Premium) | * Azure Backup <br> * Azure Monitor <br> * Microsoft Entra ID |
76
+
||||||
77
+
| Azure NetApp Files | * Databases <br> * VDI <br> * HPC | IOPS and Throughput measured using FIO | * Capacity Pools up to 100 TiB per pool <br> * Volumes up to 100 TiB per volume | * AKS <br> * Azure Backup <br> * Azure Monitor |
78
+
||||||
79
+
| Azure Managed Lustre | * Large-scale simulations <br> * Genomics <br> * Scientific Workloads | Throughput up to 30GB/s with the 250MB/s/TiB performance tier | * File Systems up to 1.5 PB capacity<br> * Throughput up to 375 GB/s | * Azure Blob Storage <br> * AKS <br> * Azure Monitor |
80
+
||||||
89
81
90
82
---
91
83
92
-
## Blobfuse2 – Mounting Blob Storage
93
-
- Virtual File System driver for mounting Blob storage.
94
-
- Supports file caching and streaming with block-cache.
95
-
- High throughput, secure, open-source.
96
-
- Ideal for AI training and fine-tuning scenarios.
84
+
## AI and RAG Workload Storage Requirements
85
+
86
+
The storage requirements for AI and RAG workloads vary across different stages. During the training stage, it is essential to have high throughput, checkpointing, local caching, and the ability to load large models. For the inference stage, fast model access, low latency, and concurrent GPU access are required. In the RAG stage, secure unstructured storage, vector database integration, freshness, and low latency are necessary.
0 commit comments