Skip to content
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
0228148
DOC-1667 Document Shadow Link in Cloud
micheleRP Nov 25, 2025
bffbfea
single source rpk shadow
micheleRP Nov 25, 2025
d3db493
conditionalize parts for Cloud
micheleRP Nov 25, 2025
05bac83
add description for rpk shadow (for index page)
micheleRP Nov 25, 2025
d3e108c
minor edits
micheleRP Nov 25, 2025
9c5e166
fix headings
micheleRP Nov 25, 2025
7a6ffad
update conditionalizing
micheleRP Nov 25, 2025
781937a
move conceptual sections from setup into overview
micheleRP Nov 25, 2025
6e8ac75
rearranging setup
micheleRP Nov 26, 2025
582e395
byoc/dedicated only
micheleRP Dec 1, 2025
614e5c5
conditionalizing
micheleRP Dec 1, 2025
043e166
change Configure Failover to Failover
micheleRP Dec 1, 2025
763f4da
incorporate Trevor's review feedback
micheleRP Dec 2, 2025
6c97ab7
minor edit
micheleRP Dec 2, 2025
841a5a6
move tip next to sample config file
micheleRP Dec 2, 2025
d8e0bbc
remove disaster readiness checklist from overview
micheleRP Dec 2, 2025
f48eefd
remove duplicated content
micheleRP Dec 2, 2025
1d9c180
incorporate Simon's feedback
micheleRP Dec 2, 2025
aea6e82
Update modules/manage/pages/disaster-recovery/shadowing/setup.adoc
micheleRP Dec 2, 2025
1905b94
add shadowing metrics
paulohtb6 Dec 2, 2025
a90f4de
reintroduce shadowing metrics
paulohtb6 Dec 2, 2025
12b3ce6
Merge branch 'main' into add-metrics
paulohtb6 Dec 2, 2025
611025a
add whats new
paulohtb6 Dec 2, 2025
12f1ec5
code review
paulohtb6 Dec 2, 2025
10365a6
fix
paulohtb6 Dec 2, 2025
974cec9
Update local-antora-playbook.yml
paulohtb6 Dec 2, 2025
3bb252d
Apply suggestion from @micheleRP
micheleRP Dec 12, 2025
ccff0dc
Apply suggestion from @micheleRP
micheleRP Dec 12, 2025
c372064
Apply suggestion from @micheleRP
micheleRP Dec 12, 2025
dec1ea4
incorporate feedback from review
micheleRP Dec 12, 2025
bca7216
Merge branch 'main' into add-metrics
paulohtb6 Dec 12, 2025
d37426b
Fix
paulohtb6 Dec 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions modules/get-started/pages/release-notes/redpanda.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ Redpanda v25.3 introduces xref:deploy:redpanda/manual/disaster-recovery/shadowin

The shadow cluster operates in read-only mode while continuously receiving updates from the source cluster. During a disaster, you can failover individual topics or an entire shadow link to make resources fully writable for production traffic. See xref:deploy:redpanda/manual/disaster-recovery/shadowing/failover-runbook.adoc[] for emergency procedures.

Shadowing includes comprehensive metrics for monitoring replication health. See xref:manage:disaster-recovery/shadowing/monitor.adoc[] and xref:reference:public-metrics-reference.adoc#shadow-link-metrics[Shadow Link metrics reference].

== Connected client monitoring

You can view details about Kafka client connections using `rpk` or the Admin API ListKafkaConnections endpoint. This allows you to view detailed information about active client connections on a cluster, and identify and troubleshoot problematic clients. For more information, see the xref:manage:cluster-maintenance/manage-throughput.adoc#view-connected-client-details[connected client details] example in the Manage Throughput guide.
Expand Down Expand Up @@ -51,6 +53,20 @@ You can now generate a security report for your Redpanda cluster using the link:

Redpanda v25.3 implements topic identifiers using 16 byte UUIDs as proposed in https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers[KIP-516^].

== Shadowing metrics

Redpanda v25.3 introduces comprehensive xref:reference:public-metrics-reference.adoc#shadow-link-metrics[Shadowing metrics] for monitoring disaster recovery replication:

* xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_client_errors[`redpanda_shadow_link_client_errors`] - Track Kafka client errors during shadow link operations
* xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_shadow_lag[`redpanda_shadow_link_shadow_lag`] - Monitor replication lag between source and shadow partitions
* xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_shadow_topic_state[`redpanda_shadow_link_shadow_topic_state`] - Track shadow topic state distribution across links
* xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_bytes_fetched[`redpanda_shadow_link_total_bytes_fetched`] - Monitor data transfer volume from source cluster
* xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_bytes_written[`redpanda_shadow_link_total_bytes_written`] - Track data written to shadow cluster
* xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_records_fetched[`redpanda_shadow_link_total_records_fetched`] - Monitor message throughput from source cluster
* xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_records_written[`redpanda_shadow_link_total_records_written`] - Track message throughput to shadow cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure throughput is a good word here. Throughput means rate, while this is the total records fetched/written.

Total number of records fetched by the sharded replicator (records received by the client). Monitor this metric to track message throughput from the source cluster.

When it is next to the explanation, it's good. When it's alone, it might give the wrong impression of what this metric is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, fixed!


For monitoring guidance and alert recommendations, see xref:manage:disaster-recovery/shadowing/monitor.adoc[].

== New commands

Redpanda v25.3 introduces the following xref:reference:rpk/rpk-shadow/rpk-shadow.adoc[`rpk shadow`] commands for managing Redpanda shadow links:
Expand Down
33 changes: 16 additions & 17 deletions modules/manage/pages/disaster-recovery/shadowing/monitor.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -56,36 +56,36 @@ Shadowing provides comprehensive metrics to track replication performance and he
|===
|Metric |Type |Description

|`redpanda_shadow_link_shadow_lag`
|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_client_errors[`redpanda_shadow_link_client_errors`]
|Counter
|Total number of errors encountered by the Kafka client during shadow link operations. Monitor by `shadow_link_name` to identify connection issues, authentication failures, or other client-side problems.

|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_shadow_lag[`redpanda_shadow_link_shadow_lag`]
|Gauge
|The lag of the shadow partition against the source partition, calculated as source partition LSO (Last Stable Offset) minus shadow partition HWM (High Watermark). Monitor by `shadow_link_name`, `topic`, and `partition` to understand replication lag for each partition.

|`redpanda_shadow_link_total_bytes_fetched`
|Count
|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_bytes_fetched[`redpanda_shadow_link_total_bytes_fetched`]
|Counter
|The total number of bytes fetched by a sharded replicator (bytes received by the client). Labeled by `shadow_link_name` and `shard` to track data transfer volume from the source cluster.

|`redpanda_shadow_link_total_bytes_written`
|Count
|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_bytes_written[`redpanda_shadow_link_total_bytes_written`]
|Counter
|The total number of bytes written by a sharded replicator (bytes written to the write_at_offset_stm). Uses `shadow_link_name` and `shard` labels to monitor data written to the shadow cluster.

|`redpanda_shadow_link_client_errors`
|Count
|The number of errors seen by the client. Track by `shadow_link_name` and `shard` to identify connection or protocol issues between clusters.

|`redpanda_shadow_link_shadow_topic_state`
|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_shadow_topic_state[`redpanda_shadow_link_shadow_topic_state`]
|Gauge
|Number of shadow topics in the respective states. Labeled by `shadow_link_name` and `state` to monitor topic state distribution across your shadow links.

|`redpanda_shadow_link_total_records_fetched`
|Count
|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_records_fetched[`redpanda_shadow_link_total_records_fetched`]
|Counter
|The total number of records fetched by the sharded replicator (records received by the client). Monitor by `shadow_link_name` and `shard` to track message throughput from the source.

|`redpanda_shadow_link_total_records_written`
|Count
|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_records_written[`redpanda_shadow_link_total_records_written`]
|Counter
|The total number of records written by a sharded replicator (records written to the write_at_offset_stm). Uses `shadow_link_name` and `shard` labels to monitor message throughput to the shadow cluster.
|===

See also: xref:reference:public-metrics-reference.adoc[]
For detailed descriptions of each metric, including usage examples and label definitions, see xref:reference:public-metrics-reference.adoc#shadow-link-metrics[Shadow Link metrics reference].

== Monitoring best practices

Expand All @@ -106,8 +106,7 @@ rpk shadow status <shadow-link-name> | grep -E "LAG|Lag"

Configure monitoring alerts for the following conditions, which indicate problems with Shadowing:

* **High replication lag**: When `redpanda_shadow_link_shadow_lag` exceeds your RPO requirements
* **Connection errors**: When `redpanda_shadow_link_client_errors` increases rapidly
* **High replication lag**: When xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_shadow_lag[`redpanda_shadow_link_shadow_lag`] exceeds your recovery point objective (RPO) requirements
* **Topic state changes**: When topics move to `FAULTED` state
* **Task failures**: When replication tasks enter `FAULTED` or `NOT_RUNNING` states
* **Throughput drops**: When bytes/records fetched drops significantly
Expand Down
100 changes: 100 additions & 0 deletions modules/reference/pages/public-metrics-reference.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2343,6 +2343,106 @@ Total number of bytes uploaded for the topic to object storage.
- `redpanda_namespace`
- `redpanda_topic`

---

== Shadow Link metrics

[[redpanda_shadow_link_shadow_lag]]
=== redpanda_shadow_link_shadow_lag

The lag of the shadow partition against the source partition, calculated as source partition last stable offset (LSO) minus shadow partition high watermark (HWM). Monitor this metric to understand replication lag for each partition and ensure your recovery point objective (RPO) requirements are being met.

*Type*: gauge

*Labels*:

- `shadow_link_name` - Name of the shadow link
- `topic` - Topic name
- `partition` - Partition identifier

---

[[redpanda_shadow_link_shadow_topic_state]]
=== redpanda_shadow_link_shadow_topic_state

Number of shadow topics in the respective states. Monitor this metric to track the health and status distribution of shadow topics across your shadow links.

*Type*: gauge

*Labels*:

- `shadow_link_name` - Name of the shadow link
- `state` - Topic state (active, failed, paused, failing_over, failed_over, promoting, promoted)

---

[[redpanda_shadow_link_client_errors]]
=== redpanda_shadow_link_client_errors

Total number of errors encountered by the Kafka client during shadow link operations. Monitor this metric to identify connection issues, authentication failures, or other client-side problems affecting shadow link replication.

*Type*: counter

*Labels*:

- `shadow_link_name` - Name of the shadow link

---

[[redpanda_shadow_link_total_bytes_fetched]]
=== redpanda_shadow_link_total_bytes_fetched

Total number of bytes fetched by a sharded replicator (bytes received by the client). Use this metric to track data transfer volume from the source cluster.

*Type*: counter

*Labels*:

- `shadow_link_name` - Name of the shadow link
- `shard` - Shard identifier

---

[[redpanda_shadow_link_total_bytes_written]]
=== redpanda_shadow_link_total_bytes_written

Total number of bytes written by a sharded replicator (bytes written to the write_at_offset_stm). Use this metric to monitor data written to the shadow cluster.

*Type*: counter

*Labels*:

- `shadow_link_name` - Name of the shadow link
- `shard` - Shard identifier

---

[[redpanda_shadow_link_total_records_fetched]]
=== redpanda_shadow_link_total_records_fetched

Total number of records fetched by the sharded replicator (records received by the client). Monitor this metric to track message throughput from the source cluster.

*Type*: counter

*Labels*:

- `shadow_link_name` - Name of the shadow link
- `shard` - Shard identifier

---

[[redpanda_shadow_link_total_records_written]]
=== redpanda_shadow_link_total_records_written

Total number of records written by a sharded replicator (records written to the write_at_offset_stm). Use this metric to monitor message throughput to the shadow cluster.

*Type*: counter

*Labels*:

- `shadow_link_name` - Name of the shadow link
- `shard` - Shard identifier

== Related topics

* xref:manage:monitoring.adoc[Learn how to monitor Redpanda]
Expand Down