|
| 1 | +--- |
| 2 | +title: Troubleshoot Storage Issues in Hyper-V and Windows Server Failover Clusters |
| 3 | +description: Resolves issues in storage configuration for Windows Server and Hyper-V clustered environments. |
| 4 | +ms.date: 10/08/2025 |
| 5 | +manager: dcscontentpm |
| 6 | +audience: itpro |
| 7 | +ms.topic: troubleshooting |
| 8 | +ms.reviewer: kaushika |
| 9 | +ms.custom: |
| 10 | +- sap:Virtualization and Hyper-V\Storage configuration |
| 11 | +- pcy:WinComm Storage High Avail |
| 12 | +appliesto: |
| 13 | + - <a href=https://learn.microsoft.com/windows/release-health/windows-server-release-info target=_blank>Supported versions of Windows Server</a> |
| 14 | +--- |
| 15 | + |
| 16 | +# Troubleshoot storage issues in Hyper-V and Windows Server failover clusters |
| 17 | + |
| 18 | +## Summary |
| 19 | + |
| 20 | +Storage configuration is a critical component of any Windows Server or Hyper-V clustered environment. Proper setup ensures high availability, reliable performance, data integrity, and minimal downtime of applications and virtual machines. Misconfigurations or environmental changes can trigger a range of issues—from storage resource unavailability, virtual machine (VM) failures, and performance bottlenecks, to more subtle symptoms, such as periodic event log warnings. This article provides guidance for administrators to resolve common storage configuration problems. |
| 21 | + |
| 22 | +## Troubleshooting checklist |
| 23 | + |
| 24 | +Use this checklist for systematic troubleshooting: |
| 25 | + |
| 26 | +- Verify that all hardware (disks, HBAs, switches) is compatible with the OS version. |
| 27 | +- Verify that all relevant device drivers, storage firmware, and DSM (Device Specific Module) software are up to date. |
| 28 | +- Perform a visual inspection of the physical connections (cables, SFPs, network, and storage cabling). |
| 29 | +- All cluster nodes see shared storage (in Disk Management and Failover Cluster Manager). |
| 30 | +- Verify that Multipath I/O (MPIO) is correctly installed and configured on all nodes. |
| 31 | +- Verify that Hyper-V and failover clustering features are enabled and correctly configured on all nodes. |
| 32 | +- Verify that VM and storage paths are accessible from all nodes (test by using file explorer or Get-ClusterSharedVolume on each node). |
| 33 | +- Verify that sufficient CSV and volume capacity exists for the planned workloads. |
| 34 | +- Verify that no conflicts exist between antivirus, filter drivers, or unsupported third-party software. |
| 35 | +- Review event logs for recent critical or warning events related to storage, clustering, or disk. |
| 36 | + |
| 37 | +## Common issues and solutions |
| 38 | + |
| 39 | +The following sections detail the most common failure modes and provide solutions. |
| 40 | + |
| 41 | +### CSV (Cluster Shared Volume) paused state |
| 42 | + |
| 43 | +#### Symptoms |
| 44 | + |
| 45 | +- VMs unresponsive |
| 46 | +- Event ID 5120/5142/153, "All I/O will temporarily be queued until a path to the volume is reestablished." |
| 47 | + |
| 48 | +#### Cause and Resolution |
| 49 | + |
| 50 | +- Network bottleneck or misconfiguration (for example, Network adapter teaming mismatch): |
| 51 | + - Align network adapter teaming on ALL nodes (Get-NetLbfoTeam) |
| 52 | + - Verify teaming configuration and re-enable teaming where disabled. |
| 53 | + - Consider offloading CSV and live migration traffic to a dedicated network. |
| 54 | +- Network adapter resource exhaustion (Event ID 252 warnings) |
| 55 | + - Increase RAM and network adapter resources as recommended by hardware vendor. |
| 56 | + - Monitor resource allocation and adjust or upgrade network adapters, if recurring. |
| 57 | +- Physical disk or HBA failure: |
| 58 | + - Review disk health (Get-PhysicalDisk | Format-Table), Event ID 157. |
| 59 | + - Replace faulty disks/HBAs, reseat or reconnect as necessary. |
| 60 | +- Hardware switches or cables fault (for example, abnormal voltage on FC switch): |
| 61 | + - Check storage switch logs and counters. |
| 62 | + - Replace faulty hardware components, reroute cabling. |
| 63 | +- Insufficient CSV or volume capacity |
| 64 | + - Expand volume or CSV space. |
| 65 | + - Consider setting up CSV monitoring for proactive alerts. |
| 66 | +- Multiple antivirus or unsupported filter drivers: |
| 67 | + - Uninstall all but one antivirus program. Remove unsupported filter drivers (use fltmc for inspection). |
| 68 | + |
| 69 | +### Storage path issues and disk visibility problems |
| 70 | + |
| 71 | +- Symptoms Disks missing in Disk Management, unexpected "read-only"/“offline"/"deallocated" states, Event ID 11/153/129, failover failures |
| 72 | + |
| 73 | +#### Cause and Resolution |
| 74 | + |
| 75 | +- Incorrect/Corrupt MPIO configuration: |
| 76 | + - Confirm all paths in MPIO are Online (use mpclaim -s -d). |
| 77 | + - Update/reinstall MPIO. |
| 78 | + - Switch to vendor DSM if available (especially for enterprise SANs). |
| 79 | +- Outdated/incorrect drivers: |
| 80 | + - Update storage drivers, SAN firmware, and DSM/MPIO software. |
| 81 | +- Volume not accessible on all nodes: |
| 82 | + - Check permissions and path mapping. |
| 83 | + - Use Get-ClusterSharedVolume on each node to verify path. |
| 84 | + - For CIFS/SMB shares, make sure that cluster and SCVMM accounts have necessary access. |
| 85 | +- Disk or path locked by other process: |
| 86 | + - Restart node, use handle.exe or procmon.exe to identify process. |
| 87 | + - If stuck, detach disk from VM, restart node, and reattach. |
| 88 | +- Physical or virtual storage pool misconfiguration: |
| 89 | + - Use Get-StoragePool, Get-VirtualDisk, Get-ClusterResource to verify health. |
| 90 | + - Realign pool memberships as needed. |
| 91 | + |
| 92 | +### Disk permission and Access Denied problems |
| 93 | + |
| 94 | +#### Symptoms |
| 95 | + |
| 96 | +- Migration failures |
| 97 | +- Backup failures |
| 98 | +- Storage migration returns "Access Denied 0x80070005" |
| 99 | +- "Account does not have permission to open attachment" error message |
| 100 | + |
| 101 | +#### Cause and Resolution |
| 102 | + |
| 103 | +- Incorrect NTFS permissions |
| 104 | + - Use icacls <Path\To\Disk.vhdx> /grant "NT VIRTUAL MACHINE\<VM_GUID>:(F)" to set correct permissions. |
| 105 | + - Reclaim ownership if locked: Advanced Security > Owner: Administrators > Apply permissions again. |
| 106 | +- Cluster/SCVMM accounts missing from SMB/CIFS shares |
| 107 | + - Add both SCVMM service account and cluster name object to file share permissions. |
| 108 | + - For NetApp/SAN-based CIFS, make sure that AD objects have proper rights. |
| 109 | +- Locked files from process crash or lost handle |
| 110 | + - Restart server/node to release file lock. |
| 111 | + - Check file handle by using handle.exe, process explorer, Procmon. |
| 112 | + |
| 113 | +### Cluster service instability and known update issues |
| 114 | + |
| 115 | +#### Symptoms |
| 116 | + |
| 117 | +- Cluster service (clussvc) fails |
| 118 | +- Event 7031 |
| 119 | +- Nodes don't join or enter quarantine after updating |
| 120 | +- VMs restart unexpectedly |
| 121 | + |
| 122 | +#### Cause and Resolution |
| 123 | + |
| 124 | +- Known product bug (for example, KB5062557 on Windows Server 2019 S2D): |
| 125 | + - Remove problematic update via Windows Update history. |
| 126 | + - Deploy KIR MSI per official advisory. |
| 127 | + - Reference: [IcMPath to OS work item 52578872] |
| 128 | +- Misconfigured quorum or CSV mapping: |
| 129 | + - Review and adjust quorum settings and CSV allocations as required. |
| 130 | + |
| 131 | +### Performance degradation and high latency |
| 132 | + |
| 133 | +#### Symptoms |
| 134 | + |
| 135 | +- High storage latency |
| 136 | +- poor IOPS |
| 137 | +- VM or application slowness |
| 138 | +- Recurring timeouts (Event ID 5120, 5142) |
| 139 | + |
| 140 | +#### Cause and Resolution |
| 141 | + |
| 142 | +- Disabled or misconfigured S2D cache settings: |
| 143 | + - Re-enable/optimize using proper registry values (HKLM:\SYSTEM\CurrentControlSet\Services\Spaceport\Parameters). |
| 144 | + - Review cache/SSD tier health (Get-PhysicalDisk). |
| 145 | +- Insufficient or unbalanced hardware resources: |
| 146 | + - Add or upgrade NVMe and SSD according to S2D requirements. |
| 147 | + - Monitor and rebalance workloads and repair tasks. |
| 148 | + |
| 149 | +## Common issues quick reference table |
| 150 | + |
| 151 | +| Symptom | Cause | Resolution | Key commands | |
| 152 | +| --- | --- | --- | --- | |
| 153 | +| CSV Paused, Event 5120/5142 | Network bottleneck, Network adapter teaming, disk fail | Check teaming, Network adapters, capacity, update drivers, check CSV health | Get-NetLbfoTeam, Get-ClusterSharedVolume | |
| 154 | +| Event 153/129/11 on disks | Faulty disk/HBA, MPIO config, outdated driver | Replace hardware, update/configure MPIO/DSM, update drivers | mpclaim -s -d, Get-PhysicalDisk | |
| 155 | +| Storage inaccessible on nodes | Permissions, network, pool config | Verify permissions, paths from all nodes, pool and virtual disk health | icacls, Get-ClusterSharedVolume, Get-VirtualDisk | |
| 156 | +| Access Denied (0x80070005), migration errors | NTFS/Shares permissions, locked files | Correct permissions, reclaim ownership, restart to release locks | icacls, ProcessExplorer/Procmon | |
| 157 | +| Cluster service failure, Event 7031, VMs restart | Faulty update (for example, KB5062557), quorum | Uninstall update, apply KIR, verify quorum and cluster service | Update history, KIR MSI | |
| 158 | +| High storage latency, persistent timeouts | Disabled S2D cache, hardware exhaustion | Enable/optimize cache, add/upgrade disks, rebalance storage pools | Registry, Get-PhysicalDisk | |
| 159 | + |
| 160 | +## Data collection |
| 161 | + |
| 162 | +Before you contact Microsoft Support, you can gather the following information about your issue. |
| 163 | + |
| 164 | +- **Cluster logs:** Get-ClusterLog -UseLocalTime -TimeSpan 60 |
| 165 | +- **FailoverClustering event logs:** Export relevant logs using Event Viewer. |
| 166 | +- **CSV and storage information:** |
| 167 | + |
| 168 | + ```powershell |
| 169 | + Get-ClusterSharedVolume | fl |
| 170 | + Get-PhysicalDisk | Format-Table FriendlyName, CanPool, OperationalStatus, HealthStatus |
| 171 | + Get-VirtualDisk | Format-Table FriendlyName, HealthStatus |
| 172 | + ``` |
| 173 | +- **MPIO and disk info:** mpclaim -s -d diskpart > list disk |
| 174 | +- **Network Adapter and Team Status:** Get-NetAdapter Get-NetLbfoTeam |
| 175 | +- **Permissions:** icacls <Path\To\Disk.vhdx> |
| 176 | +- **Procmon trace:** procmon.exe /Quiet /Minimized /Backingfile \<tracefile.pml> (reproduce issue) > procmon.exe /Terminate |
| 177 | +- **Handle.exe (Sysinternals):** handle.exe > handles.txt |
| 178 | +- **Driver and firmware versions:** Get-WmiObject Win32_PnPSignedDriver | Select-Object DeviceName, Manufacturer, DriverVersion |
| 179 | +
|
| 180 | +## References |
| 181 | +
|
| 182 | +- [Get-ClusterLog documentation](/powershell/module/failoverclusters/get-clusterlog) |
| 183 | +- [Storage Spaces Direct hardware requirements](/en-us/windows-server/storage/storage-spaces/storage-spaces-direct-hardware-requirements) |
| 184 | +- [Windows Server storage architectures with Hyper-V](/windows-server/virtualization/hyper-v/storage-architecture) |
0 commit comments