Skip to content

Commit 46d03fd

Browse files
authored
AB#6881: Create Troubleshooting Guide: Windows Server Storage Replica (#9993)
* Create troubleshoot-windows-server-storage-replica.md * Update troubleshoot-windows-server-storage-replica.md * Change 'Introduction' to 'Summary' in documentation Updated the section title from 'Introduction' to 'Summary' for clarity. * Update troubleshoot-windows-server-storage-replica.md * Update troubleshoot-windows-server-storage-replica.md
1 parent 495ec78 commit 46d03fd

File tree

1 file changed

+284
-0
lines changed

1 file changed

+284
-0
lines changed
Lines changed: 284 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,284 @@
1+
---
2+
title: Troubleshoot Windows Server Storage Replica
3+
description: Resolves issues in Storage Replica for Windows Server that provides block-level, synchronous and asynchronous replication for disaster recovery.
4+
ms.date: 10/08/2025
5+
manager: dcscontentpm
6+
audience: itpro
7+
ms.topic: troubleshooting
8+
ms.reviewer: kaushika
9+
ms.custom:
10+
- sap:Backup, Recovery, Disk, and Storage\Storage Replica
11+
- pcy:WinComm Storage High Avail
12+
appliesto:
13+
- <a href=https://learn.microsoft.com/windows/release-health/windows-server-release-info target=_blank>Supported versions of Windows Server</a>
14+
---
15+
16+
# Troubleshoot Windows Server storage replica
17+
18+
## Summary
19+
20+
Storage Replica (SR) is a robust feature in Windows Server (Datacenter and Standard editions) designed to provide block-level, synchronous and asynchronous replication for disaster recovery and high-availability scenarios. SR support includes single and multi-site clusters and server-to-server replication. Because of its low-level integration with storage, networking, cluster, and authentication infrastructure, SR is sensitive to misconfigurations or environmental issues across a range of layers. This article provides a troubleshooting process that covers symptoms, causes, and solutions for the most common failure scenarios to help administrators quickly diagnose and resolve issues.
21+
22+
## Troubleshooting checklist
23+
24+
Use this checklist for systematic troubleshooting:
25+
26+
- **Verify role and edition**
27+
- Verify Windows Server edition and SR feature installation on all nodes (Get-WindowsFeature -Name Storage-Replica).
28+
- Verify licensing (Standard versus Datacenter) and feature compatibility.
29+
- **Verify storage configuration**
30+
- Make sure that all intended disks (log and data) are visible to all relevant nodes and aren't in use by other services.
31+
- For clusters, verify that disks are presented as shared storage (not local-only).
32+
- **Check Networking**
33+
- Open required ports: TCP 445 (SMB), 5445 (SR), 135 (RPC), and 5985 (WinRM).
34+
- Verify node-to-node and node–to-cluster communication with Test-NetConnection.
35+
- **Review Domain/Permission Requirements**
36+
- All cluster nodes/servers must be in the same domain or a trusted domain.
37+
- Verify the SR group accounts have appropriate permissions.
38+
- **Update**
39+
- Apply all critical and recommended Windows updates and firmware and driver updates for your hardware and HBAs.
40+
- **Cluster and disk prerequisites**
41+
- SCSI-3 Persistent Reservation support is required. Verify through Failover Cluster Validation reports.
42+
- Disks must be initialized and formatted (ReFS/NTFS as appropriate). Disks must not use deprecated features (for example, dynamic disks).
43+
- **Verify Storage Replica Topology**
44+
- Only one-to-one replication is supported. Avoid one-to-many or transitive setups.
45+
- **Collect and review logs**
46+
- Gather event logs, cluster logs, and SR logs.
47+
- Run Get-SRGroup, Get-SRPartnership, and Test-SRTopology.
48+
49+
## Common issues and solutions
50+
51+
The following sections detail the most common failure modes and provide step-by-step solutions.
52+
53+
### Storage Replica Partnership doesn't create, shows "Object not found"
54+
55+
#### Symptoms
56+
57+
- Partnership creation fails and returns the following error message:
58+
59+
"Unable to create replication group... The requested object could not be found."
60+
- Replication group not visible after creation.
61+
62+
#### Cause
63+
64+
- Missing registry subkey on all nodes:
65+
HKLM\Cluster\WVR\ConfigStore
66+
- Partnership tried between clusters and servers in different forests and domains.
67+
- Log and data disks not visible or owned by the active node.
68+
69+
#### Resolution
70+
71+
1. On all nodes, make sure that the registry key exists. If not, create it:
72+
73+
```console
74+
HKEY_LOCAL_MACHINE\Cluster\WVR\ConfigStore
75+
```
76+
77+
2. Clear old SR metadata, if left over from previous attempts:
78+
79+
```powershell
80+
Clear-SRMetadata -AllPartitions -Verify:$false
81+
Clear-SRMetadata -AllLogs -Verify:$false
82+
Clear-SRMetadata -AllConfiguration -Verify:$false
83+
```
84+
85+
3. Verify that both nodes are in the same domain or a trusted domain.
86+
4. Move all involved cluster disks to the node from which the command is run:
87+
88+
```powershell
89+
Move-ClusterGroup -Name "Available Storage" -Node \<NodeName>
90+
```
91+
92+
5. Retry partnership creation.
93+
94+
### Disks not eligible or "No disks suitable for cluster disks found"
95+
96+
#### Symptoms
97+
98+
- Cluster validation reports error about persistent reservation.
99+
- Can't add disks to the cluster.
100+
- "Element not found" in cluster log or disk wizard.
101+
102+
#### Cause
103+
104+
- Storage doesn't support SCSI-3 Persistent Reservation.
105+
- Disk presented to only one node (not shared storage).
106+
- Physical and logical sector size mismatch between source and target.
107+
108+
#### Resolution
109+
110+
1. Run failover cluster validation:
111+
112+
```powershell
113+
Test-Cluster -Include "Storage Spaces", "Inventory", "Network", "System Configuration"
114+
```
115+
116+
2. If errors are reported on persistent reservation, make sure that firmware supports SCSI-3 and HBAs are updated. Engage hardware vendor if it's necessary.
117+
3. Present disks to all nodes that require access.
118+
4. Match physical and logical sector sizes by using VHDX creation parameters or by reprovisioning disk on incompatible hosts.
119+
120+
### Orphaned, suspended, or inaccessible storage replica resources
121+
122+
#### Symptoms
123+
124+
- "ReplicationSuspended," "Orphaned," or "WaitingForDestination" for SR resources.
125+
- Volumes remain in raw or inaccessible state.
126+
- Failover or DR groups don't come online after node failure.
127+
128+
#### Cause
129+
130+
- Unexpected cluster node failure or hard shutdown.
131+
- Log and data volume corruption, or incomplete failover transition.
132+
- Orphaned dependencies from incomplete partnership removal.
133+
134+
#### Resolution
135+
136+
1. Try to resume or force move the resource by using Failover Cluster Manager or PowerShell:
137+
138+
```powershell
139+
Sync-SRGroup -Name \<GroupName>
140+
Move-ClusterGroup -Name \<GroupName> -Node \<AlternateNode>
141+
```
142+
143+
2. If the state isn't restored, restart the cluster service on the affected nodes:
144+
145+
```powershell
146+
Restart-Service clussvc
147+
```
148+
149+
3. If the volume is raw or corrupted, run a file system repair:
150+
151+
```console
152+
chkdsk /f \<DriveLetter>
153+
```
154+
155+
4. For stuck resources, clear dependencies and metadata:
156+
157+
```powershell
158+
Clear-SRMetadata -AllPartitions -AllLogs -AllConfiguration
159+
```
160+
161+
5. If DR and primary partnership ownership is unclear or locked, break the partnership fully from the source, and clear all metadata before re-creating.
162+
163+
### Network or permission problems preventing replication
164+
165+
#### Symptoms
166+
167+
- Errors such as: "grant-sraccess: the parameter is incorrect" and "Component server is unavailable."
168+
- Node-to-node connectivity works, but node-to-cluster connectivity fails.
169+
170+
#### Cause
171+
172+
- Required ports closed in firewall or network ACLs.
173+
- Missing or misconfigured Kerberos trust and domain membership.
174+
- Windows services that are required by SR don't run.
175+
176+
#### Resolution
177+
178+
1. Open and verify the required ports between all nodes:
179+
- TCP 135, 445, 5445, 5985
180+
- Use Test-NetConnection \<RemoteClusterName> -Port 445
181+
2. Make sure that all nodes and servers are in the same or trusted domains.
182+
3. Verify and start all required services: RPC, Remote Registry, NetBIOS Helper, WMI.
183+
4. Synchronize time zones and clocks.
184+
185+
### Volume and log corruption or metadata issues
186+
187+
#### Symptoms
188+
189+
- "Failed to initialize the replication log path," disk reported as corrupted and unusable.
190+
- Event ID 3012, persistent raw volumes, or failure to bring online.
191+
192+
#### Cause
193+
194+
- Hardware failure or abrupt shutdowns cause log and data corruption.
195+
- Incomplete chkdsk process and repair of the affected volume.
196+
197+
#### Resolution
198+
199+
1. Run chkdsk /f and repair volumes.
200+
2. If issues persist, unmount and mount on another node for backup and re-creation.
201+
3. Recreate log and data partitions, and re-establish SR partnership.
202+
203+
### Unsupported or misconfigured scenarios
204+
205+
#### Symptoms
206+
207+
- One-to-many, cross-domain, or cross-forest replication attempts fail.
208+
- Cluster-to-standalone, or stretch cluster with local disks.
209+
210+
#### Cause
211+
212+
- Only one-to-one replication is supported.
213+
- Stretch clusters require shared disk architecture.
214+
215+
#### Resolution
216+
217+
- Redesign the replication topology to comply with documentation one-to-one, shared disk for cluster scenarios, in a single or trusted domain.
218+
219+
### Known software bugs and post-upgrade failures
220+
221+
#### Symptoms
222+
223+
- S2D disks not attachable after upgrade.
224+
- "Replication groups do not match" error.
225+
- Disks in detached state after rolling OS upgrade.
226+
227+
#### Cause
228+
229+
- Known bugs in Storage Replica or S2D code.
230+
- Missing feature installation or After-upgrade inconsistency.
231+
232+
#### Resolution
233+
234+
1. Update to the latest cumulative and out-of-band updates.
235+
2. Apply available Known Issue Rollback (KIR) packages.
236+
3. If unrecoverable, rebuild the affected cluster or storage pool according to Microsoft escalation guidance.
237+
238+
**Bugs and ICMs observed**
239+
240+
- May 2025 cumulative update for Storage Replica group creation failure.
241+
- "SPACES 32 BUG" for disk addition in more than seven node clusters (Azure Stack HCI).
242+
243+
## Common issues quick reference table
244+
245+
| Symptom | Root cause | Resolution |
246+
| --- | --- | --- |
247+
| "Unable to create replication group... object not found" | Missing registry key/config | Create HKLM\Cluster\WVR\ConfigStore, clear metadata, reattempt |
248+
| No disks suitable for cluster / cluster validation fails | SCSI reservation, sharing | Update firmware, confirm sharing, allocate as shared, check zoning |
249+
| Volume stuck in "raw" or "orphaned," Event 3012 | Metadata or FS corruption | Run chkdsk, clear SR metadata, recreate partition, reset partnership |
250+
| Disk not eligible - sector size mismatch | 512/4K sector size mismatch | Reprovision disks or VHDs with matching sector size |
251+
| Replication stuck in Suspended/Orphaned | Incomplete failover/crash | Restart cluster service, clear dependencies, move group, repair disk |
252+
| grant-sraccess "parameter incorrect"/network errors | Network/ACL or permissions | Open required ports, check services, confirm domain/trust setup |
253+
| Failover too slow/timeout, thin-provisioned disks | Thin provisioning, large VHD | Increase DeadlockTimeout, use thick provisioning where possible |
254+
| Unsupported scenario errors/NAS | Topology or domain violations | Redesign for one-to-one with shared disks in trusted domains |
255+
| Partnership removal leaves orphaned resource | Dependency/metadata leftover | Run Clear-SRMetadata, manual removal of resources/dependencies |
256+
| "Replication groups do not match" after upgrades | Disk/log size or layout mismatch, bugs | Update clusters, equalize layouts, apply patches, rebuild as needed |
257+
258+
## Data collection
259+
260+
Before you contact Microsoft Support, you can gather the following information about your issue.
261+
262+
**General cmdlets**
263+
264+
- Get-SRGroup, Get-SRPartnership, Get-SRReplicationStatus
265+
- Get-ClusterLog -Destination \<path> -TimeSpan 20
266+
- Get-Events, Get-WinEvent -LogName Microsoft-Windows-StorageReplica/Admin
267+
- Test-SRTopology -SourceComputerName <source> -SourceVolume \<vol> -DestinationComputerName \<dest> -DestinationVolume \<vol>
268+
- Clear-SRMetadata
269+
- fsutil fsinfo
270+
- For system/hardware: Get-Disk, Get-Volume, Get-PhysicalDisk
271+
- For network: Test-NetConnection \<dest> -Port 445
272+
273+
**Other**
274+
275+
- Application and system event logs (c:\windows\system32\winevt\logs)
276+
- Diagnostic tool outputs and process monitor traces, as directed
277+
- Cluster validation and precopy reports
278+
279+
## References
280+
281+
- [Storage Replica overview and requirements](/en-us/windows-server/storage/storage-replica/storage-replica-overview)
282+
- [Storage Replica FAQ](/windows-server/storage/storage-replica/storage-replica-frequently-asked-questions)
283+
- [Cluster Shared Volume FAQ](/windows-server/storage/storage-replica/storage-replica-frequently-asked-questions)
284+
- [Known issues and bug fixes](/windows-server/storage/storage-replica/storage-replica-known-issues)

0 commit comments

Comments
 (0)