Skip to content

Conversation

@kbatuigas
Copy link
Contributor

@kbatuigas kbatuigas commented Dec 4, 2025

Description

Related PR adds Disaster Recovery / Shadowing docs in Cloud: redpanda-data/cloud-docs#462

Docs for UI were merged from this PR #1511

This pull request introduces extensive improvements to the disaster recovery documentation for Redpanda's shadowing feature, focusing on making procedures clearer and providing parallel instructions for both self-hosted (rpk CLI) and cloud environments (Cloud/Data Plane/Control Plane APIs). The changes add tabbed code blocks and environment-based conditionals to all major operational guides, ensuring users can easily follow the correct steps for their deployment type. Additionally, terminology and command references have been updated for accuracy and clarity.

Cloud vs. Self-hosted Operations Documentation:

  • Added tabbed sections throughout disaster recovery guides (failover-runbook.adoc, failover.adoc, monitor.adoc) to provide side-by-side instructions for rpk CLI and Cloud API/Control Plane API operations, including listing, describing, monitoring, failover, and deletion of shadow links and topics. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]

Failover and Monitoring Enhancements:

  • Clarified and expanded instructions for monitoring replication lag and failover progress, with examples for capturing status and interpreting output fields for both environments. Also added guidance for force-deleting shadow links in emergencies. [1] [2] [3] [4] [5] [6] [7]

API Reference and Command Accuracy:

  • Updated references to Data Plane and Control Plane APIs, including example curl commands and links to official API documentation, ensuring users have correct endpoints and usage patterns. [1] [2] [3] [4] [5] [6] [7]

Terminology and UX Improvements:

  • Improved output field descriptions and state explanations for shadow links and topics, making it easier to interpret monitoring results and understand failover states. [1] [2]

Configuration and Branch Updates:

  • Updated local-antora-playbook.yml to point to the correct documentation branch for cloud disaster recovery features.

These changes significantly enhance the usability and clarity of the disaster recovery documentation, making it easier for both cloud and self-hosted users to manage shadowing and respond to cluster failures.

Resolves https://redpandadata.atlassian.net/browse/
Review deadline:

Page previews

Manage > Disaster Recovery >
Configure Shadowing
Monitor Shadowing
Failover
Failover Runbook

Checks

  • New feature
  • Content gap
  • Support Follow-up
  • Small fix (typos, links, copyedits, etc)

@kbatuigas kbatuigas requested a review from a team as a code owner December 4, 2025 00:33
@netlify
Copy link

netlify bot commented Dec 4, 2025

Deploy Preview for redpanda-docs-preview ready!

Name Link
🔨 Latest commit 8d293e5
🔍 Latest deploy log https://app.netlify.com/projects/redpanda-docs-preview/deploys/693c70ab982ff9000804cb68
😎 Deploy Preview https://deploy-preview-1498--redpanda-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 4, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This pull request updates documentation to support Cloud API integration for Redpanda's disaster recovery shadowing feature. The changes include switching the Antora playbook to build from a feature branch (DOC-1621-Document-Cloud-Feature-Shadowing-Disaster-Recovery-Enterprise) and expanding four shadowing documentation files with cloud-specific workflows. The additions introduce Data Plane API and Control Plane API examples, authentication headers, conditional blocks gated by ifdef::env-cloud[], and API endpoint references alongside existing RPK-based commands. Documentation covers shadow link creation, failover procedures, monitoring, and cleanup operations in cloud environments.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Playbook branch change: Requires verification that the branch name is correct and exists
  • API endpoint accuracy: Cloud API and Data Plane API examples need validation for correctness (URLs, request/response formats, authentication headers)
  • Conditional block consistency: Ensure ifdef::env-cloud[] blocks are correctly positioned across all four documentation files
  • Environment variable usage: Verify patterns for retrieving and using dataplane API URLs are consistent
  • Documentation flow: Check that cloud-specific sections integrate smoothly with existing non-cloud content and maintain clarity

Possibly related PRs

Suggested reviewers

  • paulohtb6
  • micheleRP
  • Feediver1

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR successfully addresses DOC-1842 objectives by documenting Shadowing usage with Cloud API (Control Plane and Data Plane) with examples, operational guidance, and setup/failover procedures across setup, failover, failover-runbook, and monitor documentation files.
Out of Scope Changes check ✅ Passed All changes are directly related to the PR objectives: documenting Cloud API shadowing usage. The Antora playbook change enables cloud-docs preview, and all documentation updates add Cloud API guidance to existing disaster recovery shadowing documentation without introducing unrelated modifications.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title clearly identifies the main change: adding Cloud API support to Shadowing documentation with a unified approach.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering all major changes and providing detailed explanations with references and page previews.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
local-antora-playbook.yml (1)

20-20: Add inline documentation and plan reversion for temporary feature branch.

The branch change is intentional for cloud documentation preview/testing, but lacks context for future maintainers. Additionally, feature branches can be deleted, causing build failures after the cloud-docs PR merges.

Recommendation 1: Add a comment above line 20 documenting the temporary nature and reversion plan:

  - url: https://github.com/redpanda-data/cloud-docs
+   # Temporary: Using feature branch for cloud API shadowing docs preview.
+   # Revert to 'main' after cloud-docs PR #462 merges.
-   branches: main
+   branches: 'DOC-1621-Document-Cloud-Feature-Shadowing-Disaster-Recovery-Enterprise'

Recommendation 2: Verify that the branch name DOC-1621-Document-Cloud-Feature-Shadowing-Disaster-Recovery-Enterprise exists in the cloud-docs repository and matches the linked PR #462 feature branch. Consider tracking a follow-up issue to revert this change after the cloud-docs PR is merged to prevent build failures.

modules/manage/pages/disaster-recovery/shadowing/failover-runbook.adoc (2)

17-17: Track TODO verification requirement.

Line 17 contains a TODO noting that command output examples need verification in a test environment. This is important for documentation accuracy, especially for failover procedures where users depend on expected output formats. Ensure this is tracked and completed before release.

Would you like help creating a tracking issue or validation script for verifying command outputs in the test environment?


102-105: Ensure consistent API terminology across sections.

The runbook mixes "Control Plane API" and "Cloud API" terminology. Lines 102-114 use "Control Plane API" for listing, but lines 199-202 retrieve a Data Plane URL without explicitly labeling the first curl as "Control Plane API". Similarly, lines 334-336 label as "DELETE" but the tab header says "Cloud API". Standardize terminology throughout: either "Control Plane API" or "Cloud API" consistently.

Also applies to: 107-110, 145-157, 334-336

modules/manage/pages/disaster-recovery/shadowing/setup.adoc (1)

648-648: Consider adding Cloud API reference at documentation conclusion.

Line 648 references the Admin API v2, which is appropriate for non-cloud environments. Consider adding a Cloud API equivalent reference for cloud-enabled users in a conditional block to maintain documentation completeness.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3aa7217 and 8416408.

📒 Files selected for processing (5)
  • local-antora-playbook.yml (1 hunks)
  • modules/manage/pages/disaster-recovery/shadowing/failover-runbook.adoc (5 hunks)
  • modules/manage/pages/disaster-recovery/shadowing/failover.adoc (2 hunks)
  • modules/manage/pages/disaster-recovery/shadowing/monitor.adoc (1 hunks)
  • modules/manage/pages/disaster-recovery/shadowing/setup.adoc (14 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-25T21:00:26.626Z
Learnt from: micheleRP
Repo: redpanda-data/docs PR: 1334
File: modules/manage/partials/rbac-dp.adoc:93-98
Timestamp: 2025-08-25T21:00:26.626Z
Learning: In cloud documentation (env-cloud), Security is at the top level navigation, so ACL references should use `security:authorization/rbac/acl.adoc`. In self-managed documentation, Security is nested under Manage, so ACL references use `manage:security/authorization/acl.adoc`. The different xref paths in conditional blocks reflect these different navigation structures.

Applied to files:

  • modules/manage/pages/disaster-recovery/shadowing/setup.adoc
🔇 Additional comments (10)
modules/manage/pages/disaster-recovery/shadowing/monitor.adoc (2)

39-41: Verify API versioning and clarify shadow-link identifier requirements.

The curl examples use /v1/shadow-links endpoints, but the PR objectives note that the Cloud API currently references v1beta2 with v1 expected on Dec 12. Verify whether these examples should reference v1beta2 or if the versioning is environment-specific. Additionally, clarify whether <shadow-link-id> and <shadow-link-name> are interchangeable or distinct identifiers, as the rpk examples use names while the API examples use IDs.

Also applies to: 68-71


78-81: Improved presentation of status command.

Good improvement wrapping the rpk shadow status command in a bash code block for consistency with the tab structure and better readability.

modules/manage/pages/disaster-recovery/shadowing/failover.adoc (2)

16-22: Cloud vs non-cloud messaging is clear and well-structured.

The conditional messaging appropriately distinguishes between cloud environments (Cloud UI, Data Plane API) and non-cloud environments (Console, Admin API), providing clear context for users.


80-91: Clarify Data Plane API path escaping and verify request structure.

Line 80 documents the endpoint path with \{shadow_link_name} (escaped braces). Verify this is the correct Antora escaping for rendering the unescaped path in documentation. Additionally, confirm the POST request body structure ("name" + optional "shadowTopicName") matches the Data Plane API specification.

modules/manage/pages/disaster-recovery/shadowing/setup.adoc (6)

75-76: Excellent alignment of xref paths for cloud vs non-cloud environments.

The conditional xref paths correctly use security:authorization/acl.adoc for cloud environments and manage:security/authorization/acl.adoc for non-cloud, matching the navigation structure differences noted in the learnings. This pattern is correctly applied throughout.

Also applies to: 80-81, 100-101


243-271: Clarify Cloud API secret reference syntax and verify configuration.

Line 271 uses ${secrets.<sasl-password-secret-name>} syntax for referencing secrets created in the source cluster. Verify this is the correct Cloud API syntax for secret interpolation and that it matches the Cloud Control Plane API specification. Additionally, ensure the secret creation requirement (line 243) is clearly discoverable in the referenced documentation.


255-300: Verify POST request body structure and API versioning.

The Control Plane API POST request to /v1/shadow-links uses snake_case field names with nested structure. Verify this structure matches the current Cloud Control Plane API specification. The PR objectives note that the API is expected to transition from v1beta2 to v1 on Dec 12; confirm whether these examples should reference v1beta2 or if the versioning is handled automatically.


152-226: Comprehensive filter documentation with clear examples.

The expanded filter section (lines 328-467) provides excellent clarity on pattern types, filter processing rules, and common use cases. The examples for topic, consumer group, and ACL filtering are well-structured and would help users configure shadow links effectively.


511-598: Well-structured networking and bootstrap configuration section.

The networking sections (lines 511-598) provide clear guidance on connection requirements, firewall configuration, bootstrap servers, and security settings. The detailed YAML examples with comments make this actionable for users.


272-273: Fix API reference links.

  • Line 313 and similar instances: Change xref:manage:api/cloud-byoc-controlplane-api.adoc#lro to xref:redpanda-cloud:manage:api/cloud-byoc-controlplane-api.adoc#lro — the API doc is in the redpanda-cloud module, not manage.
  • Line 315 and similar instances: Complete the incomplete link link:/api/doc/cloud-controlplane/v1/operation/operation-[Control Plane API reference] by adding the appropriate operation ID (e.g., operation-shadowlinkservice_createshadowlink or the correct endpoint).
⛔ Skipped due to learnings
Learnt from: micheleRP
Repo: redpanda-data/docs PR: 1349
File: modules/manage/pages/cluster-maintenance/manage-throughput.adoc:0-0
Timestamp: 2025-09-03T16:34:58.323Z
Learning: For Redpanda documentation, use absolute URLs (https://docs.redpanda.com/api/...) rather than relative URLs (/api/...) when linking to API documentation. Relative API links break in Netlify previews because Bump only serves from docs.redpanda.com, causing the relative URLs to be appended to the preview URL where Bump doesn't serve content.
Learnt from: micheleRP
Repo: redpanda-data/docs PR: 1334
File: modules/manage/partials/rbac-dp.adoc:93-98
Timestamp: 2025-08-25T21:00:26.626Z
Learning: In cloud documentation (env-cloud), Security is at the top level navigation, so ACL references should use `security:authorization/rbac/acl.adoc`. In self-managed documentation, Security is nested under Manage, so ACL references use `manage:security/authorization/acl.adoc`. The different xref paths in conditional blocks reflect these different navigation structures.
Learnt from: CR
Repo: redpanda-data/docs PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T09:42:15.235Z
Learning: Applies to docs-data/property-overrides.json : Always use full Antora resource IDs with module prefixes in xref links within property descriptions (e.g., `reference:properties/cluster-properties.adoc`, never `./cluster-properties.adoc`)
Learnt from: Feediver1
Repo: redpanda-data/docs PR: 1153
File: modules/reference/pages/properties/topic-properties.adoc:45-50
Timestamp: 2025-07-16T19:33:20.420Z
Learning: In the Redpanda documentation, topic property cross-references like <<max.compaction.lag.ms>> and <<min.compaction.lag.ms>> require corresponding property definition sections with anchors like [[maxcompactionlagms]] and [[mincompactionlagms]] to prevent broken links.
Learnt from: CR
Repo: redpanda-data/docs PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T09:42:15.235Z
Learning: Applies to docs-data/property-overrides.json : Normalize all xref links in property-overrides.json to use full Antora resource IDs after updating
Learnt from: CR
Repo: redpanda-data/docs PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T09:42:15.235Z
Learning: Applies to docs-data/property-overrides.json : Prefix self-managed-only links with `self-managed-only:` in related_topics to handle documentation pages that only exist in self-managed deployments

@kbatuigas kbatuigas force-pushed the DOC-1842-shadowing-in-cloud-api branch from 8416408 to d894b50 Compare December 4, 2025 01:02
@kbatuigas kbatuigas requested a review from simon0191 December 4, 2025 02:01
@simon0191
Copy link
Member

Why this one doesn't use tabs?

image

@simon0191
Copy link
Member

This is not true. What you can't do is write to a shadow topic.
You could have 2 clusters being shadowed by the other one but a different set of topics and both clusters are writeable. For example: https://redpandadata.slack.com/archives/C08KKE71798/p1763773380305019

image

@simon0191
Copy link
Member

In the rpk tab of this one we should also mention the need to create a secret for the SASL password, and the TLS key if needed.

image

@simon0191
Copy link
Member

In here, in the example in the comments, let's use ${secrets.<SOME DATAPLANE SECRET>}

image

@simon0191
Copy link
Member

@kbatuigas kbatuigas force-pushed the DOC-1842-shadowing-in-cloud-api branch from 47b399c to be98351 Compare December 10, 2025 07:10
include::manage:disaster-recovery/shadowing/monitor.adoc[tag=rpk-tab-health-checks]
--
Cloud API::
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c-julin It seems that rpk shadow status provides lag info grouped by partition whereas $DATAPLANE_API_URL/v1/shadowlinks/<shadow-link-name>/topic just has a total_lag field and that looks like it's aggregated for all shadow topics. Is that correct? What should the API commands look like if I want to do the equivalent of these rpk commands?

# Check all shadow links are active
rpk shadow list | grep -v "ACTIVE" || echo "All shadow links healthy"
# Monitor lag for critical topics
rpk shadow status <shadow-link-name> | grep -E "LAG|Lag"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total lag is total lag per topic, we removed lag per partition info recently to mirror admin api the only way to caclulate lag per partition is to calculate it from sources hwm - hwm.

@kbatuigas kbatuigas requested a review from c-julin December 11, 2025 19:01
@simon0191
Copy link
Member

We've removed resource group ID from ShadowLink. Let's remove it from the examples, and the docs

image

@kbatuigas kbatuigas requested a review from micheleRP December 12, 2025 00:11
* `<destination-redpanda-cluster-id>`: ID of the shadow (destination) cluster.
* `<shadow-link-name>`: Unique name for this shadow link, for example, `production-dr`.
* `<source-broker-1>:<port>`, `<source-broker-2>:<port> ...`: Source cluster brokers to connect to, for example, `prod-kafka-1.example.com:9092`, `prod-kafka-2.example.com:9092`.
* `<sasl-username>`: SASL/SCRAM username, for example, `shadow-replication-user`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say SASL/SCRAM username from the source cluster...?

Copy link
Contributor

@micheleRP micheleRP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comments, but lgtm!

@micheleRP micheleRP force-pushed the DOC-1842-shadowing-in-cloud-api branch from 507d494 to d8174ad Compare December 12, 2025 18:38
@kbatuigas kbatuigas changed the title Shadowing using Cloud API - single source Shadowing in Cloud - single source Dec 12, 2025
@kbatuigas kbatuigas merged commit bc2c791 into main Dec 12, 2025
7 checks passed
@kbatuigas kbatuigas deleted the DOC-1842-shadowing-in-cloud-api branch December 12, 2025 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants