You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can now use the chart-wide `podTemplate` field to control Pod attributes across all components. This field has lower precedence than `statefulset.podTemplate` and `post_install_job.podTemplate` but will be merged with them.
22
-
23
-
Additionally, `podTemplate` fields now support template expressions within string fields, allowing you to use Helm templating for dynamic values:
24
-
25
-
[,yaml]
26
-
----
27
-
podTemplate:
28
-
annotations:
29
-
"release-name": '{{ .Release.Name }}'
30
-
----
31
-
32
-
This compensates for functionality lost with the removal of fields like `extraVolumes`, while being more maintainable and less error prone.
33
-
34
-
=== Improved config-watcher sidecar
35
-
36
-
The config-watcher sidecar is now a dedicated Go binary that handles user management and simplifies cluster health checks. Health checks no longer fail when the sole issue is that other nodes in the cluster are unavailable.
37
-
38
-
=== rpk debug bundle now works by default
39
-
40
-
The chart now creates `Roles` and `RoleBindings` that satisfy the requirements for running `rpk debug bundle --namespace` from any Redpanda Pod. These permissions may be disabled by setting `rbac.rpkDebugBundle=false`.
41
-
42
-
The Redpanda container now always has a Kubernetes ServiceAccount token mounted to ensure `rpk debug bundle` can be executed successfully.
43
-
44
-
=== ServiceAccount creation now enabled by default
45
-
46
-
The `serviceAccount.create` field now defaults to `true`. Previously, the chart used the `default` ServiceAccount and extended it with all bindings.
47
-
48
-
=== Stricter schema validation
49
-
50
-
Any unexpected values now result in a validation error. Previously, unexpected values would have been silently ignored.
51
-
52
-
Ensure your Helm values only include valid fields before upgrading.
53
-
54
-
=== Redpanda Console v3.1.0
55
-
56
-
The Console dependency has been updated to v3.1.0. The Console integration (`console.enabled=true`) now uses the chart-managed bootstrap user rather than the first user from `auth.sasl.users`.
57
-
58
-
=== Deprecated Helm values
59
-
60
-
The following Helm values are deprecated and will be removed in a future release:
61
-
62
-
* `statefulset.sidecars.controllers.image`: Use `statefulset.sidecars.image` instead
63
-
* `statefulset.sideCars.controllers.createRBAC`: Use `rbac.enabled` or per-controller settings instead
64
-
* `statefulset.sideCars.controllers.run`: Use individual controller enabled fields instead
65
-
66
-
=== Removed Helm values
67
-
68
-
Several fields have been removed in favor of using `podTemplate`. Before upgrading, review your configurations and migrate removed fields to their `podTemplate` equivalents. For the complete list of removed fields and their replacements, see the link:https://github.com/redpanda-data/redpanda-operator/blob/release/v25.2.x/charts/redpanda/CHANGELOG.md[changelog^].
Starting in v25.2, the Redpanda Operator defaults to cluster scope instead of namespace scope. This change provides several benefits:
19
+
Redpanda Operator v25.3.x introduces the ShadowLink custom resource for managing shadow links in Kubernetes. The ShadowLink resource allows you to declaratively configure and manage disaster recovery replication between Redpanda clusters.
20
20
21
-
* **Simplified management**: A single operator instance can manage multiple Redpanda clusters across different namespaces.
22
-
* **Reduced resource overhead**: No need to deploy separate operator instances for each namespace.
23
-
* **Centralized upgrades**: Upgrade the operator once to benefit all managed Redpanda clusters.
24
-
* **Cross-namespace management**: Deploy the operator in a dedicated namespace (such as `redpanda-system`) while managing clusters in application namespaces.
25
-
* **Simplified RBAC for debug bundles**: The Redpanda Operator now provides all required permissions for `rpk` debug bundle collection by default. The `rbac.createRPKBundleCRs` flag is no longer needed.
21
+
* **Declarative configuration**: Define shadow links as Kubernetes resources with full lifecycle management.
22
+
* **Status monitoring**: View shadow link health and replication status directly from Kubernetes.
23
+
* **Integrated failover**: Delete the ShadowLink resource to fail over all topics.
26
24
27
-
==== Migration considerations
28
-
29
-
If you're upgrading from a previous version that used namespace-scoped operators:
30
-
31
-
* **No manual steps required**: The Redpanda Operator automatically reconciles existing Redpanda clusters across namespaces.
32
-
* **New deployments default to cluster scope**: Regardless of which namespace you deploy the Redpanda Operator to (including `default`).
33
-
* **Delete extra Redpanda Operator deployments**: After upgrading, ensure only one Redpanda Operator remains in the cluster (the one running in cluster scope). Use `helm uninstall` to remove any other Redpanda Operator deployments from previous namespace-scoped installations.
34
-
35
-
To maintain namespace scope, use the `--set 'additionalCmdFlags=["--namespace=<namespace>"]'` flag when installing or upgrading the Redpanda Operator. The `--namespace` flag in the helm command only specifies which namespace to deploy the Redpanda Operator into and does not affect its operational scope.
36
-
37
-
WARNING: Do not run multiple Redpanda Operators in different scopes (cluster and namespace scope) in the same cluster as this can cause resource conflicts.
38
-
39
-
==== RBAC requirements
40
-
41
-
Important RBAC considerations for v25.2+:
42
-
43
-
* **ClusterRole permissions always required**: Regardless of whether you use cluster or namespace scope, the Redpanda Operator always needs ClusterRole permissions.
44
-
* **Automatic configuration**: These permissions are automatically configured when you install the Redpanda Operator.
45
-
46
-
=== Declarative role management
47
-
48
-
Redpanda Operator v25.2.x now includes a RedpandaRole custom resource. The RedpandaRole resource allows you to declaratively manage Redpanda roles and permissions in Kubernetes, making it easier to control access and automate security policies for your Redpanda clusters. See the xref:manage:kubernetes/security/authorization/k-role-controller.adoc[RedpandaRole documentation] for details.
49
-
50
-
=== Redpanda Console v3 support (Console CRD)
51
-
52
-
Redpanda Operator v25.2.x introduces support for Redpanda Console v3 through the new Console resource. This allows you to deploy and manage Redpanda Console v3 instances directly from the Redpanda Operator.
53
-
54
-
The `console` stanza in the Redpanda resource is deprecated and will be removed in a future release.
55
-
56
-
Existing deployments that use the `console` stanza in the Redpanda resource will be automatically migrated to the Console resource. The migration happens automatically when you upgrade to v25.2.x.
57
-
58
-
If you manage your resources in version control, you should:
59
-
60
-
. Fetch and commit the migrated Console CR after the migration completes.
61
-
. Remove the `console` stanza from your Redpanda resource after the Console CR is committed to avoid configuration conflicts. Removing the stanza will not affect the migrated Console CR.
62
-
63
-
The Redpanda Operator handles the migration process from version 2 of Redpanda Console to version 3. If any configurations cannot be migrated, the Redpanda Operator displays warnings in the `warnings` field of the Console resource. If you need to manually migrate any configurations, refer to the xref:migrate:console-v3.adoc[migration guide].
64
-
65
-
All configuration and management of Redpanda Console should be done through the Console CR. See xref:console:config/configure-console.adoc[].
25
+
See xref:manage:kubernetes/shadowing/k-shadow-linking.adoc[Shadow Linking in Kubernetes] for setup and xref:manage:kubernetes/monitoring/k-monitor-shadowing.adoc[monitoring] documentation.
This guide provides step-by-step procedures for emergency failover when your primary Redpanda cluster becomes unavailable. Follow these procedures only during active disasters when immediate failover is required.
17
+
18
+
ifndef::env-cloud[]
19
+
NOTE: If you're running Redpanda in Kubernetes, see xref:manage:kubernetes/shadowing/k-failover-runbook.adoc[] for Kubernetes-specific emergency procedures.
20
+
endif::[]
21
+
17
22
// TODO: All command output examples in this guide need verification by running actual commands in test environment
18
23
19
24
[IMPORTANT]
20
25
====
21
-
This is an emergency procedure. For planned failover testing or day-to-day shadow link management, see xref:./failover.adoc[]. Ensure you have completed the xref:manage:disaster-recovery/shadowing/overview.adoc#disaster-readiness-checklist[disaster readiness checklist] before an emergency occurs.
26
+
This is an emergency procedure. For planned failover testing or day-to-day shadow link management, see xref:manage:disaster-recovery/shadowing/failover.adoc[]. Ensure you have completed the xref:manage:disaster-recovery/shadowing/overview.adoc#disaster-readiness-checklist[disaster readiness checklist] before an emergency occurs.
22
27
====
23
28
24
29
ifdef::env-cloud[]
@@ -54,19 +59,7 @@ rpk cluster info --brokers shadow-cluster-1.example.com:9092,shadow-cluster-2.ex
54
59
55
60
**Decision point**: If the primary cluster is responsive, consider whether failover is actually needed. Partial outages may not require full disaster recovery.
56
61
57
-
**Examples that require full failover:**
58
-
59
-
* Primary cluster is completely unreachable (network partition, regional outage)
60
-
* Multiple broker failures preventing writes to critical topics
61
-
* Data center failure affecting majority of brokers
62
-
* Persistent authentication or authorization failures across the cluster
63
-
64
-
**Examples that may NOT require failover:**
65
-
66
-
* Single broker failure with sufficient replicas remaining
67
-
* Temporary network connectivity issues affecting some clients
68
-
* High latency or performance degradation (but cluster still functional)
@@ -144,9 +137,7 @@ Verify that the following conditions exist before proceeding with failover:
144
137
145
138
Use xref:reference:rpk/rpk-shadow/rpk-shadow-status.adoc[`rpk shadow status`] or the link:/api/doc/cloud-dataplane/operation/operation-shadowlinkservice_listshadowlinktopics[Data Plane API] to check lag, which shows the message count difference between source and shadow partitions:
146
139
147
-
* **Acceptable lag examples**: 0-1000 messages for low-throughput topics, 0-10000 messages for high-throughput topics
148
-
* **Concerning lag examples**: Growing lag over 50,000 messages, or lag that continuously increases without recovering
149
-
* **Critical lag examples**: Lag exceeding your data loss tolerance (for example, if you can only afford to lose 1 minute of data, lag should represent less than 1 minute of typical message volume)
Note the replication lag to estimate potential data loss during failover. The `Tasks` section shows the health of shadow link replication tasks. For details about what each task does, see xref:manage:disaster-recovery/shadowing/setup.adoc#shadow-link-tasks[Shadow link tasks].
235
+
Note the replication lag to estimate potential data loss during failover. The `Tasks` section shows the health of shadow link replication tasks. For details about what each task does, see xref:manage:disaster-recovery/shadowing/overview.adoc#shadow-link-tasks[Shadow link tasks].
245
236
====
246
237
247
238
[[initiate-failover]]
@@ -574,22 +565,6 @@ Force deleting a shadow link immediately fails over all topics in the link. This
574
565
575
566
**Solution**: Verify consumer group offsets were replicated (check your filters) and use `rpk group describe <group-name>` to check offset positions. If necessary, manually reset offsets to appropriate positions. See link:https://support.redpanda.com/hc/en-us/articles/23499121317399-How-to-manage-consumer-group-offsets-in-Redpanda[How to manage consumer group offsets in Redpanda^] for detailed reset procedures.
576
567
577
-
== Next steps
578
-
579
-
After successful failover, focus on recovery planning and process improvement. Begin by assessing the source cluster failure and determining whether to restore the original cluster or permanently promote the shadow cluster as your new primary.
580
-
581
-
**Immediate recovery planning:**
582
-
583
-
1. **Assess source cluster**: Determine root cause of the outage
584
-
2. **Plan recovery**: Decide whether to restore source cluster or promote shadow cluster permanently
585
-
3. **Data synchronization**: Plan how to synchronize any data produced during failover
586
-
4. **Fail forward**: Create a new shadow link with the failed over shadow cluster as source to maintain a DR cluster
587
-
588
-
**Process improvement:**
589
-
590
-
1. **Document the incident**: Record timeline, impact, and lessons learned
591
-
2. **Update runbooks**: Improve procedures based on what you learned
Copy file name to clipboardExpand all lines: modules/manage/pages/disaster-recovery/shadowing/failover.adoc
+9-36Lines changed: 9 additions & 36 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,26 +18,18 @@ You can failover a shadow link using the Redpanda Cloud UI, `rpk`, or the Data P
18
18
endif::[]
19
19
20
20
ifndef::env-cloud[]
21
-
You can failover a shadow link using Redpanda Console, `rpk`, or the Admin API.
21
+
You can failover a shadow link using Redpanda Console, `rpk`, or the Admin API.
22
+
23
+
NOTE: If you are using Kubernetes, you can also use the Redpanda Operator's `ShadowLink` resource to manage failover. See xref:manage:kubernetes/shadowing/k-failover-runbook.adoc[Kubernetes Shadow Link Failover] for details.
3. **Updates topic state**: Changes topic status from `ACTIVE` to `FAILING_OVER`, then `FAILED_OVER`
37
-
38
-
Topic failover is irreversible. Once failed over, topics cannot return to shadow mode, and automatic fallback to the original source cluster is not supported.
39
-
40
-
NOTE: To avoid a split-brain scenario after failover, ensure that all clients are reconfigured to point to the shadow cluster before resuming write activity.
Force deleting a shadow link is irreversible and immediately fails over all topics in the link, bypassing the normal failover state transitions. This action should only be used as a last resort when topics are stuck in transitional states and you need immediate access to all replicated data.
207
199
====
208
200
209
-
== Failover states
210
-
211
-
=== Shadow link states
212
-
213
-
The shadow link itself has a simple state model:
214
-
215
-
* **`ACTIVE`**: Shadow link is operating normally, replicating data
216
-
* **`PAUSED`**: Shadow link replication is temporarily halted by user action
217
-
218
-
Shadow links do not have dedicated failover states. Instead, the link's operational status is determined by the collective state of its shadow topics.
219
-
220
-
=== Shadow topic states
221
-
222
-
Individual shadow topics progress through specific states during failover:
223
-
224
-
* **`ACTIVE`**: Normal replication state before failover
225
-
* **`FAULTED`**: Shadow topic has encountered an error and is not replicating
@@ -277,7 +250,7 @@ Task states during monitoring:
277
250
* **`NOT_RUNNING`**: Task is not currently executing
278
251
* **`LINK_UNAVAILABLE`**: Task cannot communicate with the source cluster
279
252
280
-
For detailed information about shadow link tasks and their roles, see xref:manage:disaster-recovery/shadowing/setup.adoc#shadow-link-tasks[Shadow link tasks].
253
+
For detailed information about shadow link tasks and their roles, see xref:manage:disaster-recovery/shadowing/overview.adoc#shadow-link-tasks[Shadow link tasks].
281
254
282
255
283
256
== Post-failover cluster behavior
@@ -333,6 +306,6 @@ After completing failover:
333
306
* Verify that applications can produce and consume messages normally
334
307
* Consider deleting the shadow link if failover was successful and permanent
335
308
336
-
For emergency situations, see xref:./failover-runbook.adoc[Failover Runbook].
309
+
For emergency situations, see xref:manage:disaster-recovery/shadowing/failover-runbook.adoc[Failover Runbook].
0 commit comments