Skip to content

Commit eb463b5

Browse files
authored
Create hpc-storage-options.md
1 parent b80fee1 commit eb463b5

File tree

1 file changed

+118
-0
lines changed

1 file changed

+118
-0
lines changed
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
---
2+
title: "High-Performance Computing (HPC) storage options"
3+
description: Learn about evaluating the most suitable Azure HPC storage solution for HPC.
4+
author: padmalathas
5+
ms.author: padmalathas
6+
ms.date: 06/25//2025
7+
ms.topic: reference-article
8+
ms.service: azure-virtual-machines
9+
ms.subservice: hpc
10+
# Customer intent: "As a Cloud architect, HPC administrator, I want to evaluate and select the most suitable Azure HPC storage solution based on performance, scalability, protocol support, and workload alignment for AI, HPC, and data-intensive applications."
11+
---
12+
13+
# Overview
14+
This reference article provides a detailed comparison and technical specifications of Azure’s High Performance Computing (HPC) storage solutions. It includes performance metrics, protocol support, cost tiers, and use case alignment for each storage type.
15+
16+
---
17+
18+
## Storage Services Comparison
19+
20+
| Feature | Standard Blob | Premium Blob | Premium Files | Azure NetApp Files | Azure Managed Lustre |
21+
|----------------|---------------|--------------|----------------|---------------------|-----------------------|
22+
| **Capacity** | 20+ PiB | 20+ PiB | 100 TiB | 500 TiB | 1 PiB |
23+
| **Bandwidth** | 15 GB/s | 15 GB/s | 10 GB/s | 10 GiB/s | Up to 512 GB/s |
24+
| **IOPS** | 20,000 | 20,000 | 100,000 | 800,000 | >100,000 |
25+
| **Latency** | <100 ms | <10 ms | 2–4 ms | <1 ms | <2 ms |
26+
| **Protocols** | REST, HDFS, NFSv3, SFTP, FUSE, CSI | Same | REST, NFSv4.1, SMB3, CSI | NFSv3/4.1, SMB3, CSI | Lustre, CSI |
27+
| **Cost Tier** | $ | $$ | $$ | $$$ | $$$$ |
28+
29+
---
30+
31+
## Specialized Storage Solutions
32+
33+
Azure offers a range of storage services tailored to meet the demanding needs of HPC workloads. Each solution is optimized for different performance characteristics, access patterns, and cost profiles. Following is an overview of the most relevant storage options and what they are best suited for in HPC scenarios.
34+
35+
36+
### Azure Blob Storage
37+
38+
Azure Blob Storage is a massively scalable object storage service designed for unstructured data. It supports high-throughput workloads and is ideal for storing large volumes of data such as logs, images, videos, and checkpoint files. Blob storage meets the demanding, high-throughput requirements of HPC applications while providing the scale necessary to support storage for billions of data points flowing in from IoT endpoints.
39+
40+
- Durable, available
41+
* Sixteen nines of designed durability. Choice of durability (LRS, ZRS, GRS, RA-GRS).
42+
* Geo-replication and flexibility to scale as needed.
43+
* Built-in data integrity protection (for example, bit rot).
44+
- Scalable, performant
45+
* In 10 seconds:
46+
- Processes > 820M transactions
47+
- Read/Write > 250 TB of data
48+
- Adds > 15M new objects
49+
* Allows flexible scale up as needed. 
50+
* Meets demanding, high-throughput requirements.
51+
* Stores petabytes of data, cost-effectively with multiple storage tiers.
52+
- Secure, compliant
53+
* Authentication with Microsoft Entra ID.
54+
* Flexible auth including, role-based access control (RBAC), and ACLs.
55+
* Encryption at rest.
56+
* Advanced threat protection.
57+
- Fully managed
58+
* End-to-end lifecycle management.
59+
* Policy-based access control.
60+
* Immutable (WROM) storage.
61+
62+
### Azure Files
63+
- Fully managed file shares with SMB/NFS support.
64+
- Two SKUs: Standard (general purpose) and Premium (low latency, high IOPS).
65+
- Hybrid access via Azure File Sync.
66+
- Use cases: DevOps, backups, remote work, enterprise apps.
67+
68+
### Azure NetApp Files
69+
- Enterprise-grade file storage with ONTAP technology.
70+
- Tiers: Standard, Premium, Ultra.
71+
- Dynamic performance scaling.
72+
- Ideal for databases, VDI, HPC, and containerized apps.
73+
74+
### Azure Managed Lustre
75+
- Parallel file system optimized for HPC and AI.
76+
- Up to 512 GB/s throughput.
77+
- Seamless integration with Azure Blob for tiered storage.
78+
- Best for large-scale simulations, genomics, and scientific workloads.
79+
80+
---
81+
82+
## AI and RAG Workload Storage Requirements
83+
84+
| Stage | Requirements |
85+
|-------------|------------------------------------------------------------------------------|
86+
| Training | High throughput, checkpointing, local caching, large model loading |
87+
| Inference | Fast model access, low latency, concurrent GPU access |
88+
| RAG | Secure unstructured storage, vector DB integration, freshness, low latency |
89+
90+
---
91+
92+
## Blobfuse2 – Mounting Blob Storage
93+
- Virtual File System driver for mounting Blob storage.
94+
- Supports file caching and streaming with block-cache.
95+
- High throughput, secure, open-source.
96+
- Ideal for AI training and fine-tuning scenarios.
97+
98+
---
99+
100+
## Partner Solutions
101+
102+
| Partner | Protocols | Scale | Unique Features |
103+
|----------------|---------------------|---------------|------------------------------------------------------|
104+
| Qumulo | NFS, SMB, S3 | 200 PiB | Azure-native SaaS, global namespace, cost-effective |
105+
| Dell APEX | NFS, SMB, S3, HDFS | 5.6 PiB | On-prem parity, policy-based tiering |
106+
| Nasuni | NFS, SMB, S3 || File locking, blob as primary tier |
107+
| Hammerspace | NFS, SMB, S3, pNFS || Global namespace, caching alternative |
108+
| Weka | NFS, SMB, S3 | 14 EB | High IOPS, low latency, linear scale-out |
109+
| IBM SpectrumScale | GPFS, NFS, SMB || Full GPFS stack |
110+
| DDN Exascaler | Lustre, NFS, SMB | Petabytes | Full DDN Lustre stack |
111+
112+
---
113+
114+
## Performance Optimization Tips
115+
- Size volumes based on performance, not just capacity.
116+
- Use Availability Zones to control latency.
117+
- Use large volume features in ANF for max bandwidth.
118+
- Consider caching and tiering strategies for cost efficiency.

0 commit comments

Comments
 (0)