cockroachdb · rmloveland · Nov 12, 2025 · Nov 10, 2025 · Nov 12, 2025 · Nov 12, 2025
diff --git a/src/current/v25.4/cluster-setup-troubleshooting.md b/src/current/v25.4/cluster-setup-troubleshooting.md
@@ -415,6 +415,7 @@ Symptoms of disk stalls include:
 
 - Bad cluster write performance, usually in the form of a substantial drop in QPS for a given workload.
 - [Node liveness issues](#node-liveness-issues).
+- Messages like the following start appearing in the [`STORAGE` logging channel]({% link {{ page.version.version }}/logging.md %}#storage): `disk slowness detected: write to file {path/to/store/*.sst} has been ongoing for {duration}s`
 
 Causes of disk stalls include:
 
@@ -432,6 +433,12 @@ CockroachDB's built-in disk stall detection works as follows:
 
 - During [store liveness]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leases) heartbeats, the [storage engine]({% link {{ page.version.version }}/architecture/storage-layer.md %}) writes to disk.
 
+If you see messages like the following in the [`STORAGE` logging channel]({% link {{ page.version.version }}/logging.md %}#storage), it is an early sign of severe I/O slowness, and usually means a fatal stall is imminent:
+
+- `disk slowness detected: write to file {path/to/store/*.sst} has been ongoing for {duration}s`
+
+Repeated occurrences of this message usually mean the node is effectively degraded: it will struggle to hold [range leases]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-leaseholder) and serve requests, and will degrade the entire cluster generally. Do not raise the stall thresholds to mask hardware issues. Instead, [drain and decommission the node]({% link {{ page.version.version }}/node-shutdown.md %}) and replace the [underlying storage]({% link {{ page.version.version }}/cockroach-start.md %}#storage). If you are considering tuning, refer to [`storage.max_sync_duration`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-storage-max-sync-duration) (or the corresponding environment variable `COCKROACH_ENGINE_MAX_SYNC_DURATION_DEFAULT`), but note that increasing these values generally prolongs unavailability rather than fixing the underlying problem.
+
 {% include {{ page.version.version }}/leader-leases-node-heartbeat-use-cases.md %}
 
 #### Disk utilization is different across nodes in the cluster