You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The shadow cluster operates in read-only mode while continuously receiving updates from the source cluster. During a disaster, you can failover individual topics or an entire shadow link to make resources fully writable for production traffic. See xref:deploy:redpanda/manual/disaster-recovery/shadowing/failover-runbook.adoc[] for emergency procedures.
23
23
24
+
Shadowing includes comprehensive metrics for monitoring replication health. See xref:manage:disaster-recovery/shadowing/monitor.adoc[] and xref:reference:public-metrics-reference.adoc#shadow-link-metrics[Shadow Link metrics reference].
25
+
24
26
== Connected client monitoring
25
27
26
28
You can view details about Kafka client connections using `rpk` or the Admin API ListKafkaConnections endpoint. This allows you to view detailed information about active client connections on a cluster, and identify and troubleshoot problematic clients. For more information, see the xref:manage:cluster-maintenance/manage-throughput.adoc#view-connected-client-details[connected client details] example in the Manage Throughput guide.
@@ -51,6 +53,20 @@ You can now generate a security report for your Redpanda cluster using the link:
51
53
52
54
Redpanda v25.3 implements topic identifiers using 16 byte UUIDs as proposed in https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers[KIP-516^].
* xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_client_errors[`redpanda_shadow_link_client_errors`] - Track Kafka client errors during shadow link operations
61
+
* xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_shadow_lag[`redpanda_shadow_link_shadow_lag`] - Monitor replication lag between source and shadow partitions
62
+
* xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_shadow_topic_state[`redpanda_shadow_link_shadow_topic_state`] - Track shadow topic state distribution across links
63
+
* xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_bytes_fetched[`redpanda_shadow_link_total_bytes_fetched`] - Monitor data transfer volume from source cluster
64
+
* xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_bytes_written[`redpanda_shadow_link_total_bytes_written`] - Track data written to shadow cluster
65
+
* xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_records_fetched[`redpanda_shadow_link_total_records_fetched`] - Monitor total records fetched from source cluster
66
+
* xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_records_written[`redpanda_shadow_link_total_records_written`] - Track total messages written to shadow cluster
67
+
68
+
For monitoring guidance and alert recommendations, see xref:manage:disaster-recovery/shadowing/monitor.adoc[].
69
+
54
70
== New commands
55
71
56
72
Redpanda v25.3 introduces the following xref:reference:rpk/rpk-shadow/rpk-shadow.adoc[`rpk shadow`] commands for managing Redpanda shadow links:
|Total number of errors encountered by the Kafka client during shadow link operations. Monitor by `shadow_link_name` to identify connection issues, authentication failures, or other client-side problems.
|The lag of the shadow partition against the source partition, calculated as source partition LSO (Last Stable Offset) minus shadow partition HWM (High Watermark). Monitor by `shadow_link_name`, `topic`, and `partition` to understand replication lag for each partition.
|The total number of bytes fetched by a sharded replicator (bytes received by the client). Labeled by `shadow_link_name` and `shard` to track data transfer volume from the source cluster.
|The total number of bytes written by a sharded replicator (bytes written to the write_at_offset_stm). Uses `shadow_link_name` and `shard` labels to monitor data written to the shadow cluster.
70
74
71
-
|`redpanda_shadow_link_client_errors`
72
-
|Count
73
-
|The number of errors seen by the client. Track by `shadow_link_name` and `shard` to identify connection or protocol issues between clusters.
|Number of shadow topics in the respective states. Labeled by `shadow_link_name` and `state` to monitor topic state distribution across your shadow links.
|The total number of records fetched by the sharded replicator (records received by the client). Monitor by `shadow_link_name` and `shard` to track message throughput from the source.
|The total number of records written by a sharded replicator (records written to the write_at_offset_stm). Uses `shadow_link_name` and `shard` labels to monitor message throughput to the shadow cluster.
86
86
|===
87
87
88
-
See also: xref:reference:public-metrics-reference.adoc[]
88
+
For detailed descriptions of each metric, including usage examples and label definitions, see xref:reference:public-metrics-reference.adoc#shadow-link-metrics[Shadow Link metrics reference].
Configure monitoring alerts for the following conditions, which indicate problems with Shadowing:
108
108
109
-
* **High replication lag**: When `redpanda_shadow_link_shadow_lag` exceeds your RPO requirements
110
-
* **Connection errors**: When `redpanda_shadow_link_client_errors` increases rapidly
109
+
* **High replication lag**: When xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_shadow_lag[`redpanda_shadow_link_shadow_lag`] exceeds your recovery point objective (RPO) requirements
111
110
* **Topic state changes**: When topics move to `FAULTED` state
112
111
* **Task failures**: When replication tasks enter `FAULTED` or `NOT_RUNNING` states
113
112
* **Throughput drops**: When bytes/records fetched drops significantly
Copy file name to clipboardExpand all lines: modules/reference/pages/public-metrics-reference.adoc
+100Lines changed: 100 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3418,6 +3418,106 @@ ifdef::env-cloud[]
3418
3418
*Available in Serverless*: No
3419
3419
endif::[]
3420
3420
3421
+
---
3422
+
3423
+
== Shadow link metrics
3424
+
3425
+
[[redpanda_shadow_link_shadow_lag]]
3426
+
=== redpanda_shadow_link_shadow_lag
3427
+
3428
+
The lag of the shadow partition against the source partition, calculated as source partition last stable offset (LSO) minus shadow partition high watermark (HWM). Monitor this metric to understand replication lag for each partition and ensure your recovery point objective (RPO) requirements are being met.
3429
+
3430
+
*Type*: gauge
3431
+
3432
+
*Labels*:
3433
+
3434
+
- `shadow_link_name` - Name of the shadow link
3435
+
- `topic` - Topic name
3436
+
- `partition` - Partition identifier
3437
+
3438
+
---
3439
+
3440
+
[[redpanda_shadow_link_shadow_topic_state]]
3441
+
=== redpanda_shadow_link_shadow_topic_state
3442
+
3443
+
Number of shadow topics in the respective states. Monitor this metric to track the health and status distribution of shadow topics across your shadow links.
Total number of errors encountered by the Kafka client during shadow link operations. Monitor this metric to identify connection issues, authentication failures, or other client-side problems affecting shadow link replication.
3458
+
3459
+
*Type*: counter
3460
+
3461
+
*Labels*:
3462
+
3463
+
- `shadow_link_name` - Name of the shadow link
3464
+
3465
+
---
3466
+
3467
+
[[redpanda_shadow_link_total_bytes_fetched]]
3468
+
=== redpanda_shadow_link_total_bytes_fetched
3469
+
3470
+
Total number of bytes fetched by a sharded replicator (bytes received by the client). Use this metric to track data transfer volume from the source cluster.
3471
+
3472
+
*Type*: counter
3473
+
3474
+
*Labels*:
3475
+
3476
+
- `shadow_link_name` - Name of the shadow link
3477
+
- `shard` - Shard identifier
3478
+
3479
+
---
3480
+
3481
+
[[redpanda_shadow_link_total_bytes_written]]
3482
+
=== redpanda_shadow_link_total_bytes_written
3483
+
3484
+
Total number of bytes written by a sharded replicator (bytes written to the write_at_offset_stm). Use this metric to monitor data written to the shadow cluster.
3485
+
3486
+
*Type*: counter
3487
+
3488
+
*Labels*:
3489
+
3490
+
- `shadow_link_name` - Name of the shadow link
3491
+
- `shard` - Shard identifier
3492
+
3493
+
---
3494
+
3495
+
[[redpanda_shadow_link_total_records_fetched]]
3496
+
=== redpanda_shadow_link_total_records_fetched
3497
+
3498
+
Total number of records fetched by the sharded replicator (records received by the client). Monitor this metric to track message throughput from the source cluster.
3499
+
3500
+
*Type*: counter
3501
+
3502
+
*Labels*:
3503
+
3504
+
- `shadow_link_name` - Name of the shadow link
3505
+
- `shard` - Shard identifier
3506
+
3507
+
---
3508
+
3509
+
[[redpanda_shadow_link_total_records_written]]
3510
+
=== redpanda_shadow_link_total_records_written
3511
+
3512
+
Total number of records written by a sharded replicator (records written to the write_at_offset_stm). Use this metric to monitor message throughput to the shadow cluster.
3513
+
3514
+
*Type*: counter
3515
+
3516
+
*Labels*:
3517
+
3518
+
- `shadow_link_name` - Name of the shadow link
3519
+
- `shard` - Shard identifier
3520
+
3421
3521
== Related topics
3422
3522
3423
3523
* xref:manage:monitoring.adoc[Learn how to monitor Redpanda]
0 commit comments