|
| 1 | +--- |
| 2 | +title: "High-Performance Computing (HPC) storage options" |
| 3 | +description: Learn about evaluating the most suitable Azure HPC storage solution for HPC. |
| 4 | +author: padmalathas |
| 5 | +ms.author: padmalathas |
| 6 | +ms.date: 06/25//2025 |
| 7 | +ms.topic: reference-article |
| 8 | +ms.service: azure-virtual-machines |
| 9 | +ms.subservice: hpc |
| 10 | +# Customer intent: "As a Cloud architect, HPC administrator, I want to evaluate and select the most suitable Azure HPC storage solution based on performance, scalability, protocol support, and workload alignment for AI, HPC, and data-intensive applications." |
| 11 | +--- |
| 12 | + |
| 13 | +# Overview |
| 14 | +This reference article provides a detailed comparison and technical specifications of Azure’s High Performance Computing (HPC) storage solutions. It includes performance metrics, protocol support, cost tiers, and use case alignment for each storage type. |
| 15 | + |
| 16 | +--- |
| 17 | + |
| 18 | +## Storage Services Comparison |
| 19 | + |
| 20 | +| Feature | Standard Blob | Premium Blob | Premium Files | Azure NetApp Files | Azure Managed Lustre | |
| 21 | +|----------------|---------------|--------------|----------------|---------------------|-----------------------| |
| 22 | +| **Capacity** | 20+ PiB | 20+ PiB | 100 TiB | 500 TiB | 1 PiB | |
| 23 | +| **Bandwidth** | 15 GB/s | 15 GB/s | 10 GB/s | 10 GiB/s | Up to 512 GB/s | |
| 24 | +| **IOPS** | 20,000 | 20,000 | 100,000 | 800,000 | >100,000 | |
| 25 | +| **Latency** | <100 ms | <10 ms | 2–4 ms | <1 ms | <2 ms | |
| 26 | +| **Protocols** | REST, HDFS, NFSv3, SFTP, FUSE, CSI | Same | REST, NFSv4.1, SMB3, CSI | NFSv3/4.1, SMB3, CSI | Lustre, CSI | |
| 27 | +| **Cost Tier** | $ | $$ | $$ | $$$ | $$$$ | |
| 28 | + |
| 29 | +--- |
| 30 | + |
| 31 | +## Specialized Storage Solutions |
| 32 | + |
| 33 | +Azure offers a range of storage services tailored to meet the demanding needs of HPC workloads. Each solution is optimized for different performance characteristics, access patterns, and cost profiles. Following is an overview of the most relevant storage options and what they are best suited for in HPC scenarios. |
| 34 | + |
| 35 | + |
| 36 | +### Azure Blob Storage |
| 37 | + |
| 38 | +Azure Blob Storage is a massively scalable object storage service designed for unstructured data. It supports high-throughput workloads and is ideal for storing large volumes of data such as logs, images, videos, and checkpoint files. Blob storage meets the demanding, high-throughput requirements of HPC applications while providing the scale necessary to support storage for billions of data points flowing in from IoT endpoints. |
| 39 | + |
| 40 | +- Durable, available |
| 41 | + * Sixteen nines of designed durability. Choice of durability (LRS, ZRS, GRS, RA-GRS). |
| 42 | + * Geo-replication and flexibility to scale as needed. |
| 43 | + * Built-in data integrity protection (for example, bit rot). |
| 44 | +- Scalable, performant |
| 45 | + * In 10 seconds: |
| 46 | + - Processes > 820M transactions |
| 47 | + - Read/Write > 250 TB of data |
| 48 | + - Adds > 15M new objects |
| 49 | + * Allows flexible scale up as needed. |
| 50 | + * Meets demanding, high-throughput requirements. |
| 51 | + * Stores petabytes of data, cost-effectively with multiple storage tiers. |
| 52 | +- Secure, compliant |
| 53 | + * Authentication with Microsoft Entra ID. |
| 54 | + * Flexible auth including, role-based access control (RBAC), and ACLs. |
| 55 | + * Encryption at rest. |
| 56 | + * Advanced threat protection. |
| 57 | +- Fully managed |
| 58 | + * End-to-end lifecycle management. |
| 59 | + * Policy-based access control. |
| 60 | + * Immutable (WROM) storage. |
| 61 | + |
| 62 | +### Azure Files |
| 63 | +- Fully managed file shares with SMB/NFS support. |
| 64 | +- Two SKUs: Standard (general purpose) and Premium (low latency, high IOPS). |
| 65 | +- Hybrid access via Azure File Sync. |
| 66 | +- Use cases: DevOps, backups, remote work, enterprise apps. |
| 67 | + |
| 68 | +### Azure NetApp Files |
| 69 | +- Enterprise-grade file storage with ONTAP technology. |
| 70 | +- Tiers: Standard, Premium, Ultra. |
| 71 | +- Dynamic performance scaling. |
| 72 | +- Ideal for databases, VDI, HPC, and containerized apps. |
| 73 | + |
| 74 | +### Azure Managed Lustre |
| 75 | +- Parallel file system optimized for HPC and AI. |
| 76 | +- Up to 512 GB/s throughput. |
| 77 | +- Seamless integration with Azure Blob for tiered storage. |
| 78 | +- Best for large-scale simulations, genomics, and scientific workloads. |
| 79 | + |
| 80 | +--- |
| 81 | + |
| 82 | +## AI and RAG Workload Storage Requirements |
| 83 | + |
| 84 | +| Stage | Requirements | |
| 85 | +|-------------|------------------------------------------------------------------------------| |
| 86 | +| Training | High throughput, checkpointing, local caching, large model loading | |
| 87 | +| Inference | Fast model access, low latency, concurrent GPU access | |
| 88 | +| RAG | Secure unstructured storage, vector DB integration, freshness, low latency | |
| 89 | + |
| 90 | +--- |
| 91 | + |
| 92 | +## Blobfuse2 – Mounting Blob Storage |
| 93 | +- Virtual File System driver for mounting Blob storage. |
| 94 | +- Supports file caching and streaming with block-cache. |
| 95 | +- High throughput, secure, open-source. |
| 96 | +- Ideal for AI training and fine-tuning scenarios. |
| 97 | + |
| 98 | +--- |
| 99 | + |
| 100 | +## Partner Solutions |
| 101 | + |
| 102 | +| Partner | Protocols | Scale | Unique Features | |
| 103 | +|----------------|---------------------|---------------|------------------------------------------------------| |
| 104 | +| Qumulo | NFS, SMB, S3 | 200 PiB | Azure-native SaaS, global namespace, cost-effective | |
| 105 | +| Dell APEX | NFS, SMB, S3, HDFS | 5.6 PiB | On-prem parity, policy-based tiering | |
| 106 | +| Nasuni | NFS, SMB, S3 | — | File locking, blob as primary tier | |
| 107 | +| Hammerspace | NFS, SMB, S3, pNFS | — | Global namespace, caching alternative | |
| 108 | +| Weka | NFS, SMB, S3 | 14 EB | High IOPS, low latency, linear scale-out | |
| 109 | +| IBM SpectrumScale | GPFS, NFS, SMB | — | Full GPFS stack | |
| 110 | +| DDN Exascaler | Lustre, NFS, SMB | Petabytes | Full DDN Lustre stack | |
| 111 | + |
| 112 | +--- |
| 113 | + |
| 114 | +## Performance Optimization Tips |
| 115 | +- Size volumes based on performance, not just capacity. |
| 116 | +- Use Availability Zones to control latency. |
| 117 | +- Use large volume features in ANF for max bandwidth. |
| 118 | +- Consider caching and tiering strategies for cost efficiency. |
0 commit comments