Skip to content

Commit 495ec78

Browse files
authored
AB#6876: Create Troubleshooting Guide: Storage Issues in Hyper-V and Windows Server Failover Clusters (#9990)
* Create Storage-issues-in-hyper-v-and-windows-server-failover-clusters.md * Update Storage-issues-in-hyper-v-and-windows-server-failover-clusters.md * Update Storage-issues-in-hyper-v-and-windows-server-failover-clusters.md
1 parent f434b38 commit 495ec78

File tree

1 file changed

+184
-0
lines changed

1 file changed

+184
-0
lines changed
Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
---
2+
title: Troubleshoot Storage Issues in Hyper-V and Windows Server Failover Clusters
3+
description: Resolves issues in storage configuration for Windows Server and Hyper-V clustered environments.
4+
ms.date: 10/08/2025
5+
manager: dcscontentpm
6+
audience: itpro
7+
ms.topic: troubleshooting
8+
ms.reviewer: kaushika
9+
ms.custom:
10+
- sap:Virtualization and Hyper-V\Storage configuration
11+
- pcy:WinComm Storage High Avail
12+
appliesto:
13+
- <a href=https://learn.microsoft.com/windows/release-health/windows-server-release-info target=_blank>Supported versions of Windows Server</a>
14+
---
15+
16+
# Troubleshoot storage issues in Hyper-V and Windows Server failover clusters
17+
18+
## Summary
19+
20+
Storage configuration is a critical component of any Windows Server or Hyper-V clustered environment. Proper setup ensures high availability, reliable performance, data integrity, and minimal downtime of applications and virtual machines. Misconfigurations or environmental changes can trigger a range of issues—from storage resource unavailability, virtual machine (VM) failures, and performance bottlenecks, to more subtle symptoms, such as periodic event log warnings. This article provides guidance for administrators to resolve common storage configuration problems.
21+
22+
## Troubleshooting checklist
23+
24+
Use this checklist for systematic troubleshooting:
25+
26+
- Verify that all hardware (disks, HBAs, switches) is compatible with the OS version.
27+
- Verify that all relevant device drivers, storage firmware, and DSM (Device Specific Module) software are up to date.
28+
- Perform a visual inspection of the physical connections (cables, SFPs, network, and storage cabling).
29+
- All cluster nodes see shared storage (in Disk Management and Failover Cluster Manager).
30+
- Verify that Multipath I/O (MPIO) is correctly installed and configured on all nodes.
31+
- Verify that Hyper-V and failover clustering features are enabled and correctly configured on all nodes.
32+
- Verify that VM and storage paths are accessible from all nodes (test by using file explorer or Get-ClusterSharedVolume on each node).
33+
- Verify that sufficient CSV and volume capacity exists for the planned workloads.
34+
- Verify that no conflicts exist between antivirus, filter drivers, or unsupported third-party software.
35+
- Review event logs for recent critical or warning events related to storage, clustering, or disk.
36+
37+
## Common issues and solutions
38+
39+
The following sections detail the most common failure modes and provide solutions.
40+
41+
### CSV (Cluster Shared Volume) paused state
42+
43+
#### Symptoms
44+
45+
- VMs unresponsive
46+
- Event ID 5120/5142/153, "All I/O will temporarily be queued until a path to the volume is reestablished."
47+
48+
#### Cause and Resolution
49+
50+
- Network bottleneck or misconfiguration (for example, Network adapter teaming mismatch):
51+
- Align network adapter teaming on ALL nodes (Get-NetLbfoTeam)
52+
- Verify teaming configuration and re-enable teaming where disabled.
53+
- Consider offloading CSV and live migration traffic to a dedicated network.
54+
- Network adapter resource exhaustion (Event ID 252 warnings)
55+
- Increase RAM and network adapter resources as recommended by hardware vendor.
56+
- Monitor resource allocation and adjust or upgrade network adapters, if recurring.
57+
- Physical disk or HBA failure:
58+
- Review disk health (Get-PhysicalDisk | Format-Table), Event ID 157.
59+
- Replace faulty disks/HBAs, reseat or reconnect as necessary.
60+
- Hardware switches or cables fault (for example, abnormal voltage on FC switch):
61+
- Check storage switch logs and counters.
62+
- Replace faulty hardware components, reroute cabling.
63+
- Insufficient CSV or volume capacity
64+
- Expand volume or CSV space.
65+
- Consider setting up CSV monitoring for proactive alerts.
66+
- Multiple antivirus or unsupported filter drivers:
67+
- Uninstall all but one antivirus program. Remove unsupported filter drivers (use fltmc for inspection).
68+
69+
### Storage path issues and disk visibility problems
70+
71+
- Symptoms Disks missing in Disk Management, unexpected "read-only"/“offline"/"deallocated" states, Event ID 11/153/129, failover failures
72+
73+
#### Cause and Resolution
74+
75+
- Incorrect/Corrupt MPIO configuration:
76+
- Confirm all paths in MPIO are Online (use mpclaim -s -d).
77+
- Update/reinstall MPIO.
78+
- Switch to vendor DSM if available (especially for enterprise SANs).
79+
- Outdated/incorrect drivers:
80+
- Update storage drivers, SAN firmware, and DSM/MPIO software.
81+
- Volume not accessible on all nodes:
82+
- Check permissions and path mapping.
83+
- Use Get-ClusterSharedVolume on each node to verify path.
84+
- For CIFS/SMB shares, make sure that cluster and SCVMM accounts have necessary access.
85+
- Disk or path locked by other process:
86+
- Restart node, use handle.exe or procmon.exe to identify process.
87+
- If stuck, detach disk from VM, restart node, and reattach.
88+
- Physical or virtual storage pool misconfiguration:
89+
- Use Get-StoragePool, Get-VirtualDisk, Get-ClusterResource to verify health.
90+
- Realign pool memberships as needed.
91+
92+
### Disk permission and Access Denied problems
93+
94+
#### Symptoms
95+
96+
- Migration failures
97+
- Backup failures
98+
- Storage migration returns "Access Denied 0x80070005"
99+
- "Account does not have permission to open attachment" error message
100+
101+
#### Cause and Resolution
102+
103+
- Incorrect NTFS permissions
104+
- Use icacls <Path\To\Disk.vhdx> /grant "NT VIRTUAL MACHINE\<VM_GUID>:(F)" to set correct permissions.
105+
- Reclaim ownership if locked: Advanced Security > Owner: Administrators > Apply permissions again.
106+
- Cluster/SCVMM accounts missing from SMB/CIFS shares
107+
- Add both SCVMM service account and cluster name object to file share permissions.
108+
- For NetApp/SAN-based CIFS, make sure that AD objects have proper rights.
109+
- Locked files from process crash or lost handle
110+
- Restart server/node to release file lock.
111+
- Check file handle by using handle.exe, process explorer, Procmon.
112+
113+
### Cluster service instability and known update issues
114+
115+
#### Symptoms
116+
117+
- Cluster service (clussvc) fails
118+
- Event 7031
119+
- Nodes don't join or enter quarantine after updating
120+
- VMs restart unexpectedly
121+
122+
#### Cause and Resolution
123+
124+
- Known product bug (for example, KB5062557 on Windows Server 2019 S2D):
125+
- Remove problematic update via Windows Update history.
126+
- Deploy KIR MSI per official advisory.
127+
- Reference: [IcMPath to OS work item 52578872]
128+
- Misconfigured quorum or CSV mapping:
129+
- Review and adjust quorum settings and CSV allocations as required.
130+
131+
### Performance degradation and high latency
132+
133+
#### Symptoms
134+
135+
- High storage latency
136+
- poor IOPS
137+
- VM or application slowness
138+
- Recurring timeouts (Event ID 5120, 5142)
139+
140+
#### Cause and Resolution
141+
142+
- Disabled or misconfigured S2D cache settings:
143+
- Re-enable/optimize using proper registry values (HKLM:\SYSTEM\CurrentControlSet\Services\Spaceport\Parameters).
144+
- Review cache/SSD tier health (Get-PhysicalDisk).
145+
- Insufficient or unbalanced hardware resources:
146+
- Add or upgrade NVMe and SSD according to S2D requirements.
147+
- Monitor and rebalance workloads and repair tasks.
148+
149+
## Common issues quick reference table
150+
151+
| Symptom | Cause | Resolution | Key commands |
152+
| --- | --- | --- | --- |
153+
| CSV Paused, Event 5120/5142 | Network bottleneck, Network adapter teaming, disk fail | Check teaming, Network adapters, capacity, update drivers, check CSV health | Get-NetLbfoTeam, Get-ClusterSharedVolume |
154+
| Event 153/129/11 on disks | Faulty disk/HBA, MPIO config, outdated driver | Replace hardware, update/configure MPIO/DSM, update drivers | mpclaim -s -d, Get-PhysicalDisk |
155+
| Storage inaccessible on nodes | Permissions, network, pool config | Verify permissions, paths from all nodes, pool and virtual disk health | icacls, Get-ClusterSharedVolume, Get-VirtualDisk |
156+
| Access Denied (0x80070005), migration errors | NTFS/Shares permissions, locked files | Correct permissions, reclaim ownership, restart to release locks | icacls, ProcessExplorer/Procmon |
157+
| Cluster service failure, Event 7031, VMs restart | Faulty update (for example, KB5062557), quorum | Uninstall update, apply KIR, verify quorum and cluster service | Update history, KIR MSI |
158+
| High storage latency, persistent timeouts | Disabled S2D cache, hardware exhaustion | Enable/optimize cache, add/upgrade disks, rebalance storage pools | Registry, Get-PhysicalDisk |
159+
160+
## Data collection
161+
162+
Before you contact Microsoft Support, you can gather the following information about your issue.
163+
164+
- **Cluster logs:** Get-ClusterLog -UseLocalTime -TimeSpan 60
165+
- **FailoverClustering event logs:** Export relevant logs using Event Viewer.
166+
- **CSV and storage information:**
167+
168+
```powershell
169+
Get-ClusterSharedVolume | fl
170+
Get-PhysicalDisk | Format-Table FriendlyName, CanPool, OperationalStatus, HealthStatus
171+
Get-VirtualDisk | Format-Table FriendlyName, HealthStatus
172+
```
173+
- **MPIO and disk info:** mpclaim -s -d diskpart > list disk
174+
- **Network Adapter and Team Status:** Get-NetAdapter Get-NetLbfoTeam
175+
- **Permissions:** icacls <Path\To\Disk.vhdx>
176+
- **Procmon trace:** procmon.exe /Quiet /Minimized /Backingfile \<tracefile.pml> (reproduce issue) > procmon.exe /Terminate
177+
- **Handle.exe (Sysinternals):** handle.exe > handles.txt
178+
- **Driver and firmware versions:** Get-WmiObject Win32_PnPSignedDriver | Select-Object DeviceName, Manufacturer, DriverVersion
179+
180+
## References
181+
182+
- [Get-ClusterLog documentation](/powershell/module/failoverclusters/get-clusterlog)
183+
- [Storage Spaces Direct hardware requirements](/en-us/windows-server/storage/storage-spaces/storage-spaces-direct-hardware-requirements)
184+
- [Windows Server storage architectures with Hyper-V](/windows-server/virtualization/hyper-v/storage-architecture)

0 commit comments

Comments
 (0)