Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/update-property-docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ jobs:
run: |
set -euo pipefail
TAG="${{ steps.tag.outputs.tag }}"
CURRENT=$(grep 'latest-redpanda-tag:' antora.yml | awk '{print $2}' | tr -d '"')
CURRENT=$(grep 'latest-redpanda-tag:' antora.yml | awk '{print $2}' | tr -d "\"'")

echo "📄 Current latest-redpanda-tag in antora.yml: $CURRENT"
echo "🔖 Incoming tag: $TAG"
Expand Down
4 changes: 4 additions & 0 deletions modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,10 @@
**** xref:manage:kubernetes/security/k-audit-logging.adoc[Audit Logging]
*** xref:manage:kubernetes/k-rack-awareness.adoc[Rack Awareness]
*** xref:manage:kubernetes/k-remote-read-replicas.adoc[Remote Read Replicas]
*** xref:manage:kubernetes/shadowing/index.adoc[Shadowing]
**** xref:manage:kubernetes/shadowing/k-shadow-linking.adoc[Configure Shadowing]
**** xref:manage:kubernetes/monitoring/k-monitor-shadowing.adoc[Monitor]
**** xref:manage:kubernetes/shadowing/k-failover-runbook.adoc[Failover Runbook]
*** xref:manage:kubernetes/k-manage-resources.adoc[Manage Pod Resources]
*** xref:manage:kubernetes/k-scale-redpanda.adoc[Scale]
*** xref:manage:kubernetes/k-nodewatcher.adoc[]
Expand Down
55 changes: 2 additions & 53 deletions modules/get-started/pages/release-notes/helm-charts.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,57 +12,6 @@ See also:
* xref:upgrade:k-compatibility.adoc[]
* xref:upgrade:k-rolling-upgrade.adoc[]

== Redpanda chart v25.2.1
== Redpanda chart v25.3.x

link:https://github.com/redpanda-data/redpanda-operator/blob/release/v25.2.x/charts/redpanda/CHANGELOG.md[Changelog^].

=== New chart-wide podTemplate field

You can now use the chart-wide `podTemplate` field to control Pod attributes across all components. This field has lower precedence than `statefulset.podTemplate` and `post_install_job.podTemplate` but will be merged with them.

Additionally, `podTemplate` fields now support template expressions within string fields, allowing you to use Helm templating for dynamic values:

[,yaml]
----
podTemplate:
annotations:
"release-name": '{{ .Release.Name }}'
----

This compensates for functionality lost with the removal of fields like `extraVolumes`, while being more maintainable and less error prone.

=== Improved config-watcher sidecar

The config-watcher sidecar is now a dedicated Go binary that handles user management and simplifies cluster health checks. Health checks no longer fail when the sole issue is that other nodes in the cluster are unavailable.

=== rpk debug bundle now works by default

The chart now creates `Roles` and `RoleBindings` that satisfy the requirements for running `rpk debug bundle --namespace` from any Redpanda Pod. These permissions may be disabled by setting `rbac.rpkDebugBundle=false`.

The Redpanda container now always has a Kubernetes ServiceAccount token mounted to ensure `rpk debug bundle` can be executed successfully.

=== ServiceAccount creation now enabled by default

The `serviceAccount.create` field now defaults to `true`. Previously, the chart used the `default` ServiceAccount and extended it with all bindings.

=== Stricter schema validation

Any unexpected values now result in a validation error. Previously, unexpected values would have been silently ignored.

Ensure your Helm values only include valid fields before upgrading.

=== Redpanda Console v3.1.0

The Console dependency has been updated to v3.1.0. The Console integration (`console.enabled=true`) now uses the chart-managed bootstrap user rather than the first user from `auth.sasl.users`.

=== Deprecated Helm values

The following Helm values are deprecated and will be removed in a future release:

* `statefulset.sidecars.controllers.image`: Use `statefulset.sidecars.image` instead
* `statefulset.sideCars.controllers.createRBAC`: Use `rbac.enabled` or per-controller settings instead
* `statefulset.sideCars.controllers.run`: Use individual controller enabled fields instead

=== Removed Helm values

Several fields have been removed in favor of using `podTemplate`. Before upgrading, review your configurations and migrate removed fields to their `podTemplate` equivalents. For the complete list of removed fields and their replacements, see the link:https://github.com/redpanda-data/redpanda-operator/blob/release/v25.2.x/charts/redpanda/CHANGELOG.md[changelog^].
link:https://github.com/redpanda-data/redpanda-operator/blob/release/v25.3.x/charts/redpanda/CHANGELOG.md[Changelog^].
56 changes: 8 additions & 48 deletions modules/get-started/pages/release-notes/operator.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,56 +10,16 @@ See also:
* xref:upgrade:k-rolling-upgrade.adoc[]


== Redpanda Operator v25.2.x
== Redpanda Operator v25.3.x

link:https://github.com/redpanda-data/redpanda-operator/blob/release/v25.2.x/operator/CHANGELOG.md[Changelog^]
link:https://github.com/redpanda-data/redpanda-operator/blob/release/v25.3.x/operator/CHANGELOG.md[Changelog^]

=== Cluster scope by default
=== ShadowLink resource for disaster recovery

Starting in v25.2, the Redpanda Operator defaults to cluster scope instead of namespace scope. This change provides several benefits:
Redpanda Operator v25.3.x introduces the ShadowLink custom resource for managing shadow links in Kubernetes. The ShadowLink resource allows you to declaratively configure and manage disaster recovery replication between Redpanda clusters.

* **Simplified management**: A single operator instance can manage multiple Redpanda clusters across different namespaces.
* **Reduced resource overhead**: No need to deploy separate operator instances for each namespace.
* **Centralized upgrades**: Upgrade the operator once to benefit all managed Redpanda clusters.
* **Cross-namespace management**: Deploy the operator in a dedicated namespace (such as `redpanda-system`) while managing clusters in application namespaces.
* **Simplified RBAC for debug bundles**: The Redpanda Operator now provides all required permissions for `rpk` debug bundle collection by default. The `rbac.createRPKBundleCRs` flag is no longer needed.
* **Declarative configuration**: Define shadow links as Kubernetes resources with full lifecycle management.
* **Status monitoring**: View shadow link health and replication status directly from Kubernetes.
* **Integrated failover**: Delete the ShadowLink resource to fail over all topics.

==== Migration considerations

If you're upgrading from a previous version that used namespace-scoped operators:

* **No manual steps required**: The Redpanda Operator automatically reconciles existing Redpanda clusters across namespaces.
* **New deployments default to cluster scope**: Regardless of which namespace you deploy the Redpanda Operator to (including `default`).
* **Delete extra Redpanda Operator deployments**: After upgrading, ensure only one Redpanda Operator remains in the cluster (the one running in cluster scope). Use `helm uninstall` to remove any other Redpanda Operator deployments from previous namespace-scoped installations.

To maintain namespace scope, use the `--set 'additionalCmdFlags=["--namespace=<namespace>"]'` flag when installing or upgrading the Redpanda Operator. The `--namespace` flag in the helm command only specifies which namespace to deploy the Redpanda Operator into and does not affect its operational scope.

WARNING: Do not run multiple Redpanda Operators in different scopes (cluster and namespace scope) in the same cluster as this can cause resource conflicts.

==== RBAC requirements

Important RBAC considerations for v25.2+:

* **ClusterRole permissions always required**: Regardless of whether you use cluster or namespace scope, the Redpanda Operator always needs ClusterRole permissions.
* **Automatic configuration**: These permissions are automatically configured when you install the Redpanda Operator.

=== Declarative role management

Redpanda Operator v25.2.x now includes a RedpandaRole custom resource. The RedpandaRole resource allows you to declaratively manage Redpanda roles and permissions in Kubernetes, making it easier to control access and automate security policies for your Redpanda clusters. See the xref:manage:kubernetes/security/authorization/k-role-controller.adoc[RedpandaRole documentation] for details.

=== Redpanda Console v3 support (Console CRD)

Redpanda Operator v25.2.x introduces support for Redpanda Console v3 through the new Console resource. This allows you to deploy and manage Redpanda Console v3 instances directly from the Redpanda Operator.

The `console` stanza in the Redpanda resource is deprecated and will be removed in a future release.

Existing deployments that use the `console` stanza in the Redpanda resource will be automatically migrated to the Console resource. The migration happens automatically when you upgrade to v25.2.x.

If you manage your resources in version control, you should:

. Fetch and commit the migrated Console CR after the migration completes.
. Remove the `console` stanza from your Redpanda resource after the Console CR is committed to avoid configuration conflicts. Removing the stanza will not affect the migrated Console CR.

The Redpanda Operator handles the migration process from version 2 of Redpanda Console to version 3. If any configurations cannot be migrated, the Redpanda Operator displays warnings in the `warnings` field of the Console resource. If you need to manually migrate any configurations, refer to the xref:migrate:console-v3.adoc[migration guide].

All configuration and management of Redpanda Console should be done through the Console CR. See xref:console:config/configure-console.adoc[].
See xref:manage:kubernetes/shadowing/k-shadow-linking.adoc[Shadow Linking in Kubernetes] for setup and xref:manage:kubernetes/monitoring/k-monitor-shadowing.adoc[monitoring] documentation.
2 changes: 2 additions & 0 deletions modules/manage/examples/kubernetes/shadow-links.feature
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ Feature: ShadowLink CRDs
When I apply Kubernetes manifest:
"""
---
# tag::basic-shadowlink-example[]
apiVersion: cluster.redpanda.com/v1alpha2
kind: ShadowLink
metadata:
Expand All @@ -46,6 +47,7 @@ Feature: ShadowLink CRDs
- name: topic1
filterType: include
patternType: literal
# end::basic-shadowlink-example[]
"""
And shadow link "link" is successfully synced
Then I should find topic "topic1" in cluster "sasl"
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,16 @@ include::shared:partial$enterprise-license.adoc[]
endif::[]

This guide provides step-by-step procedures for emergency failover when your primary Redpanda cluster becomes unavailable. Follow these procedures only during active disasters when immediate failover is required.

ifndef::env-cloud[]
NOTE: If you're running Redpanda in Kubernetes, see xref:manage:kubernetes/shadowing/k-failover-runbook.adoc[] for Kubernetes-specific emergency procedures.
endif::[]

// TODO: All command output examples in this guide need verification by running actual commands in test environment

[IMPORTANT]
====
This is an emergency procedure. For planned failover testing or day-to-day shadow link management, see xref:./failover.adoc[]. Ensure you have completed the xref:manage:disaster-recovery/shadowing/overview.adoc#disaster-readiness-checklist[disaster readiness checklist] before an emergency occurs.
This is an emergency procedure. For planned failover testing or day-to-day shadow link management, see xref:manage:disaster-recovery/shadowing/failover.adoc[]. Ensure you have completed the xref:manage:disaster-recovery/shadowing/overview.adoc#disaster-readiness-checklist[disaster readiness checklist] before an emergency occurs.
====

ifdef::env-cloud[]
Expand Down Expand Up @@ -54,19 +59,7 @@ rpk cluster info --brokers shadow-cluster-1.example.com:9092,shadow-cluster-2.ex

**Decision point**: If the primary cluster is responsive, consider whether failover is actually needed. Partial outages may not require full disaster recovery.

**Examples that require full failover:**

* Primary cluster is completely unreachable (network partition, regional outage)
* Multiple broker failures preventing writes to critical topics
* Data center failure affecting majority of brokers
* Persistent authentication or authorization failures across the cluster

**Examples that may NOT require failover:**

* Single broker failure with sufficient replicas remaining
* Temporary network connectivity issues affecting some clients
* High latency or performance degradation (but cluster still functional)
* Non-critical topic or partition unavailability
include::manage:partial$shadowing/failover-decision-examples.adoc[]

[[verify-shadow-status]]
=== Verify shadow cluster status
Expand Down Expand Up @@ -144,9 +137,7 @@ Verify that the following conditions exist before proceeding with failover:

Use xref:reference:rpk/rpk-shadow/rpk-shadow-status.adoc[`rpk shadow status`] or the link:/api/doc/cloud-dataplane/operation/operation-shadowlinkservice_listshadowlinktopics[Data Plane API] to check lag, which shows the message count difference between source and shadow partitions:

* **Acceptable lag examples**: 0-1000 messages for low-throughput topics, 0-10000 messages for high-throughput topics
* **Concerning lag examples**: Growing lag over 50,000 messages, or lag that continuously increases without recovering
* **Critical lag examples**: Lag exceeding your data loss tolerance (for example, if you can only afford to lose 1 minute of data, lag should represent less than 1 minute of typical message volume)
include::manage:partial$shadowing/replication-lag-guidelines.adoc[]

[[document-state]]
=== Document current state
Expand Down Expand Up @@ -241,7 +232,7 @@ ifdef::env-cloud[high_watermark]

[IMPORTANT]
====
Note the replication lag to estimate potential data loss during failover. The `Tasks` section shows the health of shadow link replication tasks. For details about what each task does, see xref:manage:disaster-recovery/shadowing/setup.adoc#shadow-link-tasks[Shadow link tasks].
Note the replication lag to estimate potential data loss during failover. The `Tasks` section shows the health of shadow link replication tasks. For details about what each task does, see xref:manage:disaster-recovery/shadowing/overview.adoc#shadow-link-tasks[Shadow link tasks].
====

[[initiate-failover]]
Expand Down Expand Up @@ -574,22 +565,6 @@ Force deleting a shadow link immediately fails over all topics in the link. This

**Solution**: Verify consumer group offsets were replicated (check your filters) and use `rpk group describe <group-name>` to check offset positions. If necessary, manually reset offsets to appropriate positions. See link:https://support.redpanda.com/hc/en-us/articles/23499121317399-How-to-manage-consumer-group-offsets-in-Redpanda[How to manage consumer group offsets in Redpanda^] for detailed reset procedures.

== Next steps

After successful failover, focus on recovery planning and process improvement. Begin by assessing the source cluster failure and determining whether to restore the original cluster or permanently promote the shadow cluster as your new primary.

**Immediate recovery planning:**

1. **Assess source cluster**: Determine root cause of the outage
2. **Plan recovery**: Decide whether to restore source cluster or promote shadow cluster permanently
3. **Data synchronization**: Plan how to synchronize any data produced during failover
4. **Fail forward**: Create a new shadow link with the failed over shadow cluster as source to maintain a DR cluster

**Process improvement:**

1. **Document the incident**: Record timeline, impact, and lessons learned
2. **Update runbooks**: Improve procedures based on what you learned
3. **Test regularly**: Schedule regular disaster recovery drills
4. **Review monitoring**: Ensure monitoring caught the issue appropriately
include::manage:partial$shadowing/failover-next-steps.adoc[]

// end::single-source[]
45 changes: 9 additions & 36 deletions modules/manage/pages/disaster-recovery/shadowing/failover.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,26 +18,18 @@ You can failover a shadow link using the Redpanda Cloud UI, `rpk`, or the Data P
endif::[]

ifndef::env-cloud[]
You can failover a shadow link using Redpanda Console, `rpk`, or the Admin API.
You can failover a shadow link using Redpanda Console, `rpk`, or the Admin API.

NOTE: If you are using Kubernetes, you can also use the Redpanda Operator's `ShadowLink` resource to manage failover. See xref:manage:kubernetes/shadowing/k-failover-runbook.adoc[Kubernetes Shadow Link Failover] for details.
endif::[]

include::shared:partial$emergency-shadowing-callout.adoc[]
include::manage:partial$shadowing/emergency-shadowing-callout.adoc[]

ifdef::env-cloud[]
NOTE: Shadowing is supported on BYOC and Dedicated clusters running Redpanda version 25.3 and later.
NOTE: Shadowing is supported on BYOC and Dedicated clusters running Redpanda version 25.3 and later.
endif::[]

== Failover behavior

When you initiate failover, Redpanda performs the following operations:

1. **Stops replication**: Halts all data fetching from the source cluster for the specified topics or entire shadow link
2. **Failover topics**: Converts read-only shadow topics into regular, writable topics
3. **Updates topic state**: Changes topic status from `ACTIVE` to `FAILING_OVER`, then `FAILED_OVER`

Topic failover is irreversible. Once failed over, topics cannot return to shadow mode, and automatic fallback to the original source cluster is not supported.

NOTE: To avoid a split-brain scenario after failover, ensure that all clients are reconfigured to point to the shadow cluster before resuming write activity.
include::manage:partial$shadowing/failover-behavior.adoc[]

== Failover commands

Expand Down Expand Up @@ -206,26 +198,7 @@ endif::[]
Force deleting a shadow link is irreversible and immediately fails over all topics in the link, bypassing the normal failover state transitions. This action should only be used as a last resort when topics are stuck in transitional states and you need immediate access to all replicated data.
====

== Failover states

=== Shadow link states

The shadow link itself has a simple state model:

* **`ACTIVE`**: Shadow link is operating normally, replicating data
* **`PAUSED`**: Shadow link replication is temporarily halted by user action

Shadow links do not have dedicated failover states. Instead, the link's operational status is determined by the collective state of its shadow topics.

=== Shadow topic states

Individual shadow topics progress through specific states during failover:

* **`ACTIVE`**: Normal replication state before failover
* **`FAULTED`**: Shadow topic has encountered an error and is not replicating
* **`FAILING_OVER`**: Failover initiated, replication stopping
* **`FAILED_OVER`**: Failover completed successfully, topic fully writable
* **`PAUSED`**: Replication temporarily halted by user action
include::manage:partial$shadowing/failover-states.adoc[]

== Monitor failover progress

Expand Down Expand Up @@ -277,7 +250,7 @@ Task states during monitoring:
* **`NOT_RUNNING`**: Task is not currently executing
* **`LINK_UNAVAILABLE`**: Task cannot communicate with the source cluster

For detailed information about shadow link tasks and their roles, see xref:manage:disaster-recovery/shadowing/setup.adoc#shadow-link-tasks[Shadow link tasks].
For detailed information about shadow link tasks and their roles, see xref:manage:disaster-recovery/shadowing/overview.adoc#shadow-link-tasks[Shadow link tasks].


== Post-failover cluster behavior
Expand Down Expand Up @@ -333,6 +306,6 @@ After completing failover:
* Verify that applications can produce and consume messages normally
* Consider deleting the shadow link if failover was successful and permanent

For emergency situations, see xref:./failover-runbook.adoc[Failover Runbook].
For emergency situations, see xref:manage:disaster-recovery/shadowing/failover-runbook.adoc[Failover Runbook].

// end::single-source[]
Loading