diff --git a/.github/workflows/update-property-docs.yaml b/.github/workflows/update-property-docs.yaml index 557ba8de27..82cc434515 100644 --- a/.github/workflows/update-property-docs.yaml +++ b/.github/workflows/update-property-docs.yaml @@ -64,7 +64,7 @@ jobs: run: | set -euo pipefail TAG="${{ steps.tag.outputs.tag }}" - CURRENT=$(grep 'latest-redpanda-tag:' antora.yml | awk '{print $2}' | tr -d '"') + CURRENT=$(grep 'latest-redpanda-tag:' antora.yml | awk '{print $2}' | tr -d "\"'") echo "📄 Current latest-redpanda-tag in antora.yml: $CURRENT" echo "🔖 Incoming tag: $TAG" diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index 2d13f4533c..e0a2f718ce 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -136,6 +136,10 @@ **** xref:manage:kubernetes/security/k-audit-logging.adoc[Audit Logging] *** xref:manage:kubernetes/k-rack-awareness.adoc[Rack Awareness] *** xref:manage:kubernetes/k-remote-read-replicas.adoc[Remote Read Replicas] +*** xref:manage:kubernetes/shadowing/index.adoc[Shadowing] +**** xref:manage:kubernetes/shadowing/k-shadow-linking.adoc[Configure Shadowing] +**** xref:manage:kubernetes/monitoring/k-monitor-shadowing.adoc[Monitor] +**** xref:manage:kubernetes/shadowing/k-failover-runbook.adoc[Failover Runbook] *** xref:manage:kubernetes/k-manage-resources.adoc[Manage Pod Resources] *** xref:manage:kubernetes/k-scale-redpanda.adoc[Scale] *** xref:manage:kubernetes/k-nodewatcher.adoc[] diff --git a/modules/get-started/pages/release-notes/helm-charts.adoc b/modules/get-started/pages/release-notes/helm-charts.adoc index 99900fdc5a..9aa030bc00 100644 --- a/modules/get-started/pages/release-notes/helm-charts.adoc +++ b/modules/get-started/pages/release-notes/helm-charts.adoc @@ -12,57 +12,6 @@ See also: * xref:upgrade:k-compatibility.adoc[] * xref:upgrade:k-rolling-upgrade.adoc[] -== Redpanda chart v25.2.1 +== Redpanda chart v25.3.x -link:https://github.com/redpanda-data/redpanda-operator/blob/release/v25.2.x/charts/redpanda/CHANGELOG.md[Changelog^]. - -=== New chart-wide podTemplate field - -You can now use the chart-wide `podTemplate` field to control Pod attributes across all components. This field has lower precedence than `statefulset.podTemplate` and `post_install_job.podTemplate` but will be merged with them. - -Additionally, `podTemplate` fields now support template expressions within string fields, allowing you to use Helm templating for dynamic values: - -[,yaml] ----- -podTemplate: - annotations: - "release-name": '{{ .Release.Name }}' ----- - -This compensates for functionality lost with the removal of fields like `extraVolumes`, while being more maintainable and less error prone. - -=== Improved config-watcher sidecar - -The config-watcher sidecar is now a dedicated Go binary that handles user management and simplifies cluster health checks. Health checks no longer fail when the sole issue is that other nodes in the cluster are unavailable. - -=== rpk debug bundle now works by default - -The chart now creates `Roles` and `RoleBindings` that satisfy the requirements for running `rpk debug bundle --namespace` from any Redpanda Pod. These permissions may be disabled by setting `rbac.rpkDebugBundle=false`. - -The Redpanda container now always has a Kubernetes ServiceAccount token mounted to ensure `rpk debug bundle` can be executed successfully. - -=== ServiceAccount creation now enabled by default - -The `serviceAccount.create` field now defaults to `true`. Previously, the chart used the `default` ServiceAccount and extended it with all bindings. - -=== Stricter schema validation - -Any unexpected values now result in a validation error. Previously, unexpected values would have been silently ignored. - -Ensure your Helm values only include valid fields before upgrading. - -=== Redpanda Console v3.1.0 - -The Console dependency has been updated to v3.1.0. The Console integration (`console.enabled=true`) now uses the chart-managed bootstrap user rather than the first user from `auth.sasl.users`. - -=== Deprecated Helm values - -The following Helm values are deprecated and will be removed in a future release: - -* `statefulset.sidecars.controllers.image`: Use `statefulset.sidecars.image` instead -* `statefulset.sideCars.controllers.createRBAC`: Use `rbac.enabled` or per-controller settings instead -* `statefulset.sideCars.controllers.run`: Use individual controller enabled fields instead - -=== Removed Helm values - -Several fields have been removed in favor of using `podTemplate`. Before upgrading, review your configurations and migrate removed fields to their `podTemplate` equivalents. For the complete list of removed fields and their replacements, see the link:https://github.com/redpanda-data/redpanda-operator/blob/release/v25.2.x/charts/redpanda/CHANGELOG.md[changelog^]. +link:https://github.com/redpanda-data/redpanda-operator/blob/release/v25.3.x/charts/redpanda/CHANGELOG.md[Changelog^]. diff --git a/modules/get-started/pages/release-notes/operator.adoc b/modules/get-started/pages/release-notes/operator.adoc index 5c1c659b98..be119af9dc 100644 --- a/modules/get-started/pages/release-notes/operator.adoc +++ b/modules/get-started/pages/release-notes/operator.adoc @@ -10,56 +10,16 @@ See also: * xref:upgrade:k-rolling-upgrade.adoc[] -== Redpanda Operator v25.2.x +== Redpanda Operator v25.3.x -link:https://github.com/redpanda-data/redpanda-operator/blob/release/v25.2.x/operator/CHANGELOG.md[Changelog^] +link:https://github.com/redpanda-data/redpanda-operator/blob/release/v25.3.x/operator/CHANGELOG.md[Changelog^] -=== Cluster scope by default +=== ShadowLink resource for disaster recovery -Starting in v25.2, the Redpanda Operator defaults to cluster scope instead of namespace scope. This change provides several benefits: +Redpanda Operator v25.3.x introduces the ShadowLink custom resource for managing shadow links in Kubernetes. The ShadowLink resource allows you to declaratively configure and manage disaster recovery replication between Redpanda clusters. -* **Simplified management**: A single operator instance can manage multiple Redpanda clusters across different namespaces. -* **Reduced resource overhead**: No need to deploy separate operator instances for each namespace. -* **Centralized upgrades**: Upgrade the operator once to benefit all managed Redpanda clusters. -* **Cross-namespace management**: Deploy the operator in a dedicated namespace (such as `redpanda-system`) while managing clusters in application namespaces. -* **Simplified RBAC for debug bundles**: The Redpanda Operator now provides all required permissions for `rpk` debug bundle collection by default. The `rbac.createRPKBundleCRs` flag is no longer needed. +* **Declarative configuration**: Define shadow links as Kubernetes resources with full lifecycle management. +* **Status monitoring**: View shadow link health and replication status directly from Kubernetes. +* **Integrated failover**: Delete the ShadowLink resource to fail over all topics. -==== Migration considerations - -If you're upgrading from a previous version that used namespace-scoped operators: - -* **No manual steps required**: The Redpanda Operator automatically reconciles existing Redpanda clusters across namespaces. -* **New deployments default to cluster scope**: Regardless of which namespace you deploy the Redpanda Operator to (including `default`). -* **Delete extra Redpanda Operator deployments**: After upgrading, ensure only one Redpanda Operator remains in the cluster (the one running in cluster scope). Use `helm uninstall` to remove any other Redpanda Operator deployments from previous namespace-scoped installations. - -To maintain namespace scope, use the `--set 'additionalCmdFlags=["--namespace="]'` flag when installing or upgrading the Redpanda Operator. The `--namespace` flag in the helm command only specifies which namespace to deploy the Redpanda Operator into and does not affect its operational scope. - -WARNING: Do not run multiple Redpanda Operators in different scopes (cluster and namespace scope) in the same cluster as this can cause resource conflicts. - -==== RBAC requirements - -Important RBAC considerations for v25.2+: - -* **ClusterRole permissions always required**: Regardless of whether you use cluster or namespace scope, the Redpanda Operator always needs ClusterRole permissions. -* **Automatic configuration**: These permissions are automatically configured when you install the Redpanda Operator. - -=== Declarative role management - -Redpanda Operator v25.2.x now includes a RedpandaRole custom resource. The RedpandaRole resource allows you to declaratively manage Redpanda roles and permissions in Kubernetes, making it easier to control access and automate security policies for your Redpanda clusters. See the xref:manage:kubernetes/security/authorization/k-role-controller.adoc[RedpandaRole documentation] for details. - -=== Redpanda Console v3 support (Console CRD) - -Redpanda Operator v25.2.x introduces support for Redpanda Console v3 through the new Console resource. This allows you to deploy and manage Redpanda Console v3 instances directly from the Redpanda Operator. - -The `console` stanza in the Redpanda resource is deprecated and will be removed in a future release. - -Existing deployments that use the `console` stanza in the Redpanda resource will be automatically migrated to the Console resource. The migration happens automatically when you upgrade to v25.2.x. - -If you manage your resources in version control, you should: - -. Fetch and commit the migrated Console CR after the migration completes. -. Remove the `console` stanza from your Redpanda resource after the Console CR is committed to avoid configuration conflicts. Removing the stanza will not affect the migrated Console CR. - -The Redpanda Operator handles the migration process from version 2 of Redpanda Console to version 3. If any configurations cannot be migrated, the Redpanda Operator displays warnings in the `warnings` field of the Console resource. If you need to manually migrate any configurations, refer to the xref:migrate:console-v3.adoc[migration guide]. - -All configuration and management of Redpanda Console should be done through the Console CR. See xref:console:config/configure-console.adoc[]. \ No newline at end of file +See xref:manage:kubernetes/shadowing/k-shadow-linking.adoc[Shadow Linking in Kubernetes] for setup and xref:manage:kubernetes/monitoring/k-monitor-shadowing.adoc[monitoring] documentation. \ No newline at end of file diff --git a/modules/manage/examples/kubernetes/shadow-links.feature b/modules/manage/examples/kubernetes/shadow-links.feature index 3051c8bbc7..af6f00a304 100644 --- a/modules/manage/examples/kubernetes/shadow-links.feature +++ b/modules/manage/examples/kubernetes/shadow-links.feature @@ -29,6 +29,7 @@ Feature: ShadowLink CRDs When I apply Kubernetes manifest: """ --- +# tag::basic-shadowlink-example[] apiVersion: cluster.redpanda.com/v1alpha2 kind: ShadowLink metadata: @@ -46,6 +47,7 @@ Feature: ShadowLink CRDs - name: topic1 filterType: include patternType: literal +# end::basic-shadowlink-example[] """ And shadow link "link" is successfully synced Then I should find topic "topic1" in cluster "sasl" diff --git a/modules/manage/pages/disaster-recovery/shadowing/failover-runbook.adoc b/modules/manage/pages/disaster-recovery/shadowing/failover-runbook.adoc index a9453d846b..b4ec886bc1 100644 --- a/modules/manage/pages/disaster-recovery/shadowing/failover-runbook.adoc +++ b/modules/manage/pages/disaster-recovery/shadowing/failover-runbook.adoc @@ -14,11 +14,16 @@ include::shared:partial$enterprise-license.adoc[] endif::[] This guide provides step-by-step procedures for emergency failover when your primary Redpanda cluster becomes unavailable. Follow these procedures only during active disasters when immediate failover is required. + +ifndef::env-cloud[] +NOTE: If you're running Redpanda in Kubernetes, see xref:manage:kubernetes/shadowing/k-failover-runbook.adoc[] for Kubernetes-specific emergency procedures. +endif::[] + // TODO: All command output examples in this guide need verification by running actual commands in test environment [IMPORTANT] ==== -This is an emergency procedure. For planned failover testing or day-to-day shadow link management, see xref:./failover.adoc[]. Ensure you have completed the xref:manage:disaster-recovery/shadowing/overview.adoc#disaster-readiness-checklist[disaster readiness checklist] before an emergency occurs. +This is an emergency procedure. For planned failover testing or day-to-day shadow link management, see xref:manage:disaster-recovery/shadowing/failover.adoc[]. Ensure you have completed the xref:manage:disaster-recovery/shadowing/overview.adoc#disaster-readiness-checklist[disaster readiness checklist] before an emergency occurs. ==== ifdef::env-cloud[] @@ -54,19 +59,7 @@ rpk cluster info --brokers shadow-cluster-1.example.com:9092,shadow-cluster-2.ex **Decision point**: If the primary cluster is responsive, consider whether failover is actually needed. Partial outages may not require full disaster recovery. -**Examples that require full failover:** - -* Primary cluster is completely unreachable (network partition, regional outage) -* Multiple broker failures preventing writes to critical topics -* Data center failure affecting majority of brokers -* Persistent authentication or authorization failures across the cluster - -**Examples that may NOT require failover:** - -* Single broker failure with sufficient replicas remaining -* Temporary network connectivity issues affecting some clients -* High latency or performance degradation (but cluster still functional) -* Non-critical topic or partition unavailability +include::manage:partial$shadowing/failover-decision-examples.adoc[] [[verify-shadow-status]] === Verify shadow cluster status @@ -144,9 +137,7 @@ Verify that the following conditions exist before proceeding with failover: Use xref:reference:rpk/rpk-shadow/rpk-shadow-status.adoc[`rpk shadow status`] or the link:/api/doc/cloud-dataplane/operation/operation-shadowlinkservice_listshadowlinktopics[Data Plane API] to check lag, which shows the message count difference between source and shadow partitions: -* **Acceptable lag examples**: 0-1000 messages for low-throughput topics, 0-10000 messages for high-throughput topics -* **Concerning lag examples**: Growing lag over 50,000 messages, or lag that continuously increases without recovering -* **Critical lag examples**: Lag exceeding your data loss tolerance (for example, if you can only afford to lose 1 minute of data, lag should represent less than 1 minute of typical message volume) +include::manage:partial$shadowing/replication-lag-guidelines.adoc[] [[document-state]] === Document current state @@ -241,7 +232,7 @@ ifdef::env-cloud[high_watermark] [IMPORTANT] ==== -Note the replication lag to estimate potential data loss during failover. The `Tasks` section shows the health of shadow link replication tasks. For details about what each task does, see xref:manage:disaster-recovery/shadowing/setup.adoc#shadow-link-tasks[Shadow link tasks]. +Note the replication lag to estimate potential data loss during failover. The `Tasks` section shows the health of shadow link replication tasks. For details about what each task does, see xref:manage:disaster-recovery/shadowing/overview.adoc#shadow-link-tasks[Shadow link tasks]. ==== [[initiate-failover]] @@ -574,22 +565,6 @@ Force deleting a shadow link immediately fails over all topics in the link. This **Solution**: Verify consumer group offsets were replicated (check your filters) and use `rpk group describe ` to check offset positions. If necessary, manually reset offsets to appropriate positions. See link:https://support.redpanda.com/hc/en-us/articles/23499121317399-How-to-manage-consumer-group-offsets-in-Redpanda[How to manage consumer group offsets in Redpanda^] for detailed reset procedures. -== Next steps - -After successful failover, focus on recovery planning and process improvement. Begin by assessing the source cluster failure and determining whether to restore the original cluster or permanently promote the shadow cluster as your new primary. - -**Immediate recovery planning:** - -1. **Assess source cluster**: Determine root cause of the outage -2. **Plan recovery**: Decide whether to restore source cluster or promote shadow cluster permanently -3. **Data synchronization**: Plan how to synchronize any data produced during failover -4. **Fail forward**: Create a new shadow link with the failed over shadow cluster as source to maintain a DR cluster - -**Process improvement:** - -1. **Document the incident**: Record timeline, impact, and lessons learned -2. **Update runbooks**: Improve procedures based on what you learned -3. **Test regularly**: Schedule regular disaster recovery drills -4. **Review monitoring**: Ensure monitoring caught the issue appropriately +include::manage:partial$shadowing/failover-next-steps.adoc[] // end::single-source[] \ No newline at end of file diff --git a/modules/manage/pages/disaster-recovery/shadowing/failover.adoc b/modules/manage/pages/disaster-recovery/shadowing/failover.adoc index 4b87d8913b..804987e471 100644 --- a/modules/manage/pages/disaster-recovery/shadowing/failover.adoc +++ b/modules/manage/pages/disaster-recovery/shadowing/failover.adoc @@ -18,26 +18,18 @@ You can failover a shadow link using the Redpanda Cloud UI, `rpk`, or the Data P endif::[] ifndef::env-cloud[] -You can failover a shadow link using Redpanda Console, `rpk`, or the Admin API. +You can failover a shadow link using Redpanda Console, `rpk`, or the Admin API. + +NOTE: If you are using Kubernetes, you can also use the Redpanda Operator's `ShadowLink` resource to manage failover. See xref:manage:kubernetes/shadowing/k-failover-runbook.adoc[Kubernetes Shadow Link Failover] for details. endif::[] -include::shared:partial$emergency-shadowing-callout.adoc[] +include::manage:partial$shadowing/emergency-shadowing-callout.adoc[] ifdef::env-cloud[] -NOTE: Shadowing is supported on BYOC and Dedicated clusters running Redpanda version 25.3 and later. +NOTE: Shadowing is supported on BYOC and Dedicated clusters running Redpanda version 25.3 and later. endif::[] -== Failover behavior - -When you initiate failover, Redpanda performs the following operations: - -1. **Stops replication**: Halts all data fetching from the source cluster for the specified topics or entire shadow link -2. **Failover topics**: Converts read-only shadow topics into regular, writable topics -3. **Updates topic state**: Changes topic status from `ACTIVE` to `FAILING_OVER`, then `FAILED_OVER` - -Topic failover is irreversible. Once failed over, topics cannot return to shadow mode, and automatic fallback to the original source cluster is not supported. - -NOTE: To avoid a split-brain scenario after failover, ensure that all clients are reconfigured to point to the shadow cluster before resuming write activity. +include::manage:partial$shadowing/failover-behavior.adoc[] == Failover commands @@ -206,26 +198,7 @@ endif::[] Force deleting a shadow link is irreversible and immediately fails over all topics in the link, bypassing the normal failover state transitions. This action should only be used as a last resort when topics are stuck in transitional states and you need immediate access to all replicated data. ==== -== Failover states - -=== Shadow link states - -The shadow link itself has a simple state model: - -* **`ACTIVE`**: Shadow link is operating normally, replicating data -* **`PAUSED`**: Shadow link replication is temporarily halted by user action - -Shadow links do not have dedicated failover states. Instead, the link's operational status is determined by the collective state of its shadow topics. - -=== Shadow topic states - -Individual shadow topics progress through specific states during failover: - -* **`ACTIVE`**: Normal replication state before failover -* **`FAULTED`**: Shadow topic has encountered an error and is not replicating -* **`FAILING_OVER`**: Failover initiated, replication stopping -* **`FAILED_OVER`**: Failover completed successfully, topic fully writable -* **`PAUSED`**: Replication temporarily halted by user action +include::manage:partial$shadowing/failover-states.adoc[] == Monitor failover progress @@ -277,7 +250,7 @@ Task states during monitoring: * **`NOT_RUNNING`**: Task is not currently executing * **`LINK_UNAVAILABLE`**: Task cannot communicate with the source cluster -For detailed information about shadow link tasks and their roles, see xref:manage:disaster-recovery/shadowing/setup.adoc#shadow-link-tasks[Shadow link tasks]. +For detailed information about shadow link tasks and their roles, see xref:manage:disaster-recovery/shadowing/overview.adoc#shadow-link-tasks[Shadow link tasks]. == Post-failover cluster behavior @@ -333,6 +306,6 @@ After completing failover: * Verify that applications can produce and consume messages normally * Consider deleting the shadow link if failover was successful and permanent -For emergency situations, see xref:./failover-runbook.adoc[Failover Runbook]. +For emergency situations, see xref:manage:disaster-recovery/shadowing/failover-runbook.adoc[Failover Runbook]. // end::single-source[] \ No newline at end of file diff --git a/modules/manage/pages/disaster-recovery/shadowing/monitor.adoc b/modules/manage/pages/disaster-recovery/shadowing/monitor.adoc index d6feb11092..4d809eb57b 100644 --- a/modules/manage/pages/disaster-recovery/shadowing/monitor.adoc +++ b/modules/manage/pages/disaster-recovery/shadowing/monitor.adoc @@ -11,9 +11,13 @@ include::shared:partial$enterprise-license.adoc[] ==== endif::[] -Monitor your shadow links to ensure proper replication performance and understand your disaster recovery readiness. Use `rpk` commands, metrics, and status information to track shadow link health and troubleshoot issues. +Monitor your xref:manage:disaster-recovery/shadowing/setup.adoc[shadow links] to ensure proper replication performance and understand your disaster recovery readiness. Use `rpk` commands, metrics, and status information to track shadow link health and troubleshoot issues. -include::shared:partial$emergency-shadowing-callout.adoc[] +ifndef::env-cloud[] +NOTE: If you're running Redpanda in Kubernetes, see xref:manage:kubernetes/monitoring/k-monitor-shadowing.adoc[] for Kubernetes-specific monitoring procedures. +endif::[] + +include::manage:partial$shadowing/emergency-shadowing-callout.adoc[] == Status commands @@ -152,48 +156,10 @@ endif::[] * **Shadow link state**: Overall operational state (`ACTIVE`, `PAUSED`). * **Individual topic states**: Current state of each replicated topic (`ACTIVE`, `FAULTED`, `FAILING_OVER`, `FAILED_OVER`, `PAUSED`). -* **Task status**: Health of replication tasks across brokers (`ACTIVE`, `FAULTED`, `NOT_RUNNING`, `LINK_UNAVAILABLE`). For details about shadow link tasks, see xref:manage:disaster-recovery/shadowing/setup.adoc#shadow-link-tasks[Shadow link tasks]. +* **Task status**: Health of replication tasks across brokers (`ACTIVE`, `FAULTED`, `NOT_RUNNING`, `LINK_UNAVAILABLE`). For details about shadow link tasks, see xref:manage:disaster-recovery/shadowing/overview.adoc#shadow-link-tasks[Shadow link tasks]. * **Lag information**: Replication lag per partition showing source vs shadow high watermarks (HWM). -[[shadow-link-metrics]] -== Metrics - -Shadowing provides comprehensive metrics to track replication performance and health with the xref:reference:public-metrics-reference.adoc[`public_metrics`] endpoint. - -[cols="1,1,2"] -|=== -|Metric |Type |Description - -|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_client_errors[`redpanda_shadow_link_client_errors`] -|Counter -|Total number of errors encountered by the Kafka client during shadow link operations. Monitor by `shadow_link_name` to identify connection issues, authentication failures, or other client-side problems. - -|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_shadow_lag[`redpanda_shadow_link_shadow_lag`] -|Gauge -|The lag of the shadow partition against the source partition, calculated as source partition LSO (Last Stable Offset) minus shadow partition HWM (High Watermark). Monitor by `shadow_link_name`, `topic`, and `partition` to understand replication lag for each partition. - -|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_bytes_fetched[`redpanda_shadow_link_total_bytes_fetched`] -|Counter -|The total number of bytes fetched by a sharded replicator (bytes received by the client). Labeled by `shadow_link_name` and `shard` to track data transfer volume from the source cluster. - -|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_bytes_written[`redpanda_shadow_link_total_bytes_written`] -|Counter -|The total number of bytes written by a sharded replicator (bytes written to the write_at_offset_stm). Uses `shadow_link_name` and `shard` labels to monitor data written to the shadow cluster. - -|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_shadow_topic_state[`redpanda_shadow_link_shadow_topic_state`] -|Gauge -|Number of shadow topics in the respective states. Labeled by `shadow_link_name` and `state` to monitor topic state distribution across your shadow links. - -|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_records_fetched[`redpanda_shadow_link_total_records_fetched`] -|Counter -|The total number of records fetched by the sharded replicator (records received by the client). Monitor by `shadow_link_name` and `shard` to track message throughput from the source. - -|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_records_written[`redpanda_shadow_link_total_records_written`] -|Counter -|The total number of records written by a sharded replicator (records written to the write_at_offset_stm). Uses `shadow_link_name` and `shard` labels to monitor message throughput to the shadow cluster. -|=== - -For detailed descriptions of each metric, including usage examples and label definitions, see xref:reference:public-metrics-reference.adoc#shadow-link-metrics[Shadow Link metrics reference]. +include::manage:partial$shadowing/shadow-link-metrics.adoc[] == Monitoring best practices @@ -250,16 +216,6 @@ curl "https://$DATAPLANE_API_URL/v1/shadowlinks//topic" \ ====== endif::[] -=== Alert conditions - -Configure monitoring alerts for the following conditions, which indicate problems with Shadowing: - -* **High replication lag**: When xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_shadow_lag[`redpanda_shadow_link_shadow_lag`] exceeds your recovery point objective (RPO) requirements -* **Topic state changes**: When topics move to `FAULTED` state -* **Task failures**: When replication tasks enter `FAULTED` or `NOT_RUNNING` states -* **Throughput drops**: When bytes/records fetched drops significantly -* **Link unavailability**: When tasks show `LINK_UNAVAILABLE` indicating source cluster connectivity issues -+ -For more information about shadow link tasks and their states, see xref:manage:disaster-recovery/shadowing/setup.adoc#shadow-link-tasks[Shadow link tasks]. +include::manage:partial$shadowing/shadow-link-alerts.adoc[] // end::single-source[] diff --git a/modules/manage/pages/disaster-recovery/shadowing/overview.adoc b/modules/manage/pages/disaster-recovery/shadowing/overview.adoc index 2493678756..54f1d4c6b3 100644 --- a/modules/manage/pages/disaster-recovery/shadowing/overview.adoc +++ b/modules/manage/pages/disaster-recovery/shadowing/overview.adoc @@ -18,7 +18,7 @@ endif::[] Shadowing is Redpanda's enterprise-grade disaster recovery solution that establishes asynchronous, offset-preserving replication between two distinct Redpanda clusters. A cluster is able to create a dedicated client that continuously replicates source cluster data, including offsets, timestamps, and cluster metadata. This creates a read-only shadow cluster that you can quickly failover to handle production traffic during a disaster. Shadowing keeps data flowing, even during regional outages. -include::shared:partial$emergency-shadowing-callout.adoc[] +include::manage:partial$shadowing/emergency-shadowing-callout.adoc[] Unlike traditional replication tools that re-produce messages, Shadowing copies data at the byte level, ensuring shadow topics contain identical copies of source topics with preserved offsets and timestamps. diff --git a/modules/manage/pages/disaster-recovery/shadowing/setup.adoc b/modules/manage/pages/disaster-recovery/shadowing/setup.adoc index 9f316377fc..fdb5c46532 100644 --- a/modules/manage/pages/disaster-recovery/shadowing/setup.adoc +++ b/modules/manage/pages/disaster-recovery/shadowing/setup.adoc @@ -18,6 +18,10 @@ endif::[] Deploy clusters in different geographic regions to protect against regional disasters. ==== +ifndef::env-cloud[] +If you're using Kubernetes, see xref:manage:kubernetes/shadowing/k-shadow-linking.adoc[] for Kubernetes-specific shadow link configuration. +endif::[] + == Prerequisites ifndef::env-cloud[] diff --git a/modules/manage/pages/kubernetes/monitoring/k-monitor-redpanda.adoc b/modules/manage/pages/kubernetes/monitoring/k-monitor-redpanda.adoc index 5641a72fa4..e8c0c35d10 100644 --- a/modules/manage/pages/kubernetes/monitoring/k-monitor-redpanda.adoc +++ b/modules/manage/pages/kubernetes/monitoring/k-monitor-redpanda.adoc @@ -17,4 +17,6 @@ include::manage:partial$monitor-health.adoc[] include::shared:partial$suggested-reading.adoc[] +* xref:manage:kubernetes/monitoring/k-monitor-shadowing.adoc[Monitor Shadow Links] + * https://killercoda.com/redpanda/scenario/redpanda-k8s-day2[Monitoring Redpanda in Kubernetes(Day 2 Ops)^] \ No newline at end of file diff --git a/modules/manage/pages/kubernetes/monitoring/k-monitor-shadowing.adoc b/modules/manage/pages/kubernetes/monitoring/k-monitor-shadowing.adoc new file mode 100644 index 0000000000..761c8ff19e --- /dev/null +++ b/modules/manage/pages/kubernetes/monitoring/k-monitor-shadowing.adoc @@ -0,0 +1,130 @@ += Monitor Kubernetes Shadow Links +:description: Monitor shadow link health in Kubernetes using ShadowLink resources, status commands, metrics, and best practices. +:page-categories: Management, Monitoring, Disaster Recovery +:env-kubernetes: true + +[NOTE] +==== +include::shared:partial$enterprise-license.adoc[] +==== + +Monitor your xref:manage:kubernetes/shadowing/k-shadow-linking.adoc[shadow links] to ensure proper replication performance and understand your disaster recovery readiness. For Kubernetes deployments, you can monitor shadow links using the Redpanda Operator's `ShadowLink` resource status or by using `rpk` commands directly. + +include::manage:partial$shadowing/emergency-shadowing-callout.adoc[] + +== Status commands + +[tabs] +====== +Operator:: ++ +-- +To list existing shadow links: + +[,bash] +---- +kubectl get shadowlink --namespace +---- + +To view detailed shadow link status and configuration: + +[,bash] +---- +kubectl describe shadowlink --namespace +---- + +The `kubectl describe` output shows: + +* **Shadow link state**: Overall operational state in the `Status` section +* **Individual topic states**: Current state of each replicated topic under `Shadow Topics` +* **Task status**: Health of replication tasks under `Tasks` +* **Sync status**: Whether the resource is properly synced (`Synced: True` in conditions) +* **Configuration**: Complete shadow link configuration including connection settings and filters + +For more detailed monitoring or troubleshooting, you can also use `rpk` commands as shown in the Helm tab. +-- + +Helm:: ++ +-- +To list existing shadow links: + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk shadow list +---- + +To view shadow link configuration details: + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk shadow describe +---- + +For detailed command options, see xref:reference:rpk/rpk-shadow/rpk-shadow-list.adoc[`rpk shadow list`] and xref:reference:rpk/rpk-shadow/rpk-shadow-describe.adoc[`rpk shadow describe`]. This command shows the complete configuration of the shadow link, including connection settings, filters, and synchronization options. + +To check your shadow link status and ensure proper operation: + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk shadow status +---- + +For troubleshooting specific issues, you can use command options to show individual status sections. See xref:reference:rpk/rpk-shadow/rpk-shadow-status.adoc[`rpk shadow status`] for available status options. +-- +====== + +The status output includes the following: + +* **Shadow link state**: Overall operational state (`ACTIVE`, `PAUSED`). +* **Individual topic states**: Current state of each replicated topic (`ACTIVE`, `FAULTED`, `FAILING_OVER`, `FAILED_OVER`, `PAUSED`). +* **Task status**: Health of replication tasks across brokers (`ACTIVE`, `FAULTED`, `NOT_RUNNING`, `LINK_UNAVAILABLE`). For details about shadow link tasks, see xref:manage:disaster-recovery/shadowing/overview.adoc#shadow-link-tasks[Shadow link tasks]. +* **Lag information**: Replication lag per partition showing source vs shadow high watermarks (HWM). + +include::manage:partial$shadowing/shadow-link-metrics.adoc[] + +== Monitoring best practices + +=== Health check procedures + +Establish regular monitoring workflows to ensure shadow link health: + +[tabs] +====== +Operator:: ++ +-- +[,bash] +---- +# Check all shadow links are synced and healthy +kubectl get shadowlink --namespace + +# View detailed status for a specific shadow link +kubectl describe shadowlink --namespace + +# Check for any shadow links with issues (not synced) +kubectl get shadowlink --namespace -o json | \ + jq '.items[] | select(.status.conditions[] | select(.type=="Synced" and .status!="True")) | .metadata.name' +---- +-- + +Helm:: ++ +-- +[,bash] +---- +# Check all shadow links are active +kubectl exec --namespace --container redpanda -- \ + rpk shadow list | grep -v "ACTIVE" || echo "All shadow links healthy" + +# Monitor lag for critical topics +kubectl exec --namespace --container redpanda -- \ + rpk shadow status | grep -E "LAG|Lag" +---- +-- +====== + +include::manage:partial$shadowing/shadow-link-alerts.adoc[] diff --git a/modules/manage/pages/kubernetes/shadowing/index.adoc b/modules/manage/pages/kubernetes/shadowing/index.adoc new file mode 100644 index 0000000000..a30807d404 --- /dev/null +++ b/modules/manage/pages/kubernetes/shadowing/index.adoc @@ -0,0 +1,16 @@ += Shadowing in Kubernetes +:description: Configure and manage shadow links for disaster recovery in Kubernetes deployments. +:page-layout: index +:page-categories: Management, High Availability, Disaster Recovery +:env-kubernetes: true + +[NOTE] +==== +include::shared:partial$enterprise-license.adoc[] +==== + +Shadow linking enables asynchronous, unidirectional replication between Redpanda clusters for disaster recovery. Configure shadow links using the Redpanda Operator's `ShadowLink` custom resource or using `rpk` commands. + +include::manage:partial$shadowing/emergency-shadowing-callout.adoc[] + +For general shadowing concepts and architecture, see xref:manage:disaster-recovery/shadowing/overview.adoc[]. diff --git a/modules/manage/pages/kubernetes/shadowing/k-failover-runbook.adoc b/modules/manage/pages/kubernetes/shadowing/k-failover-runbook.adoc new file mode 100644 index 0000000000..215f60887a --- /dev/null +++ b/modules/manage/pages/kubernetes/shadowing/k-failover-runbook.adoc @@ -0,0 +1,381 @@ += Kubernetes Failover Runbook +:description: Step-by-step emergency guide for failing over Redpanda shadow links in Kubernetes during disasters. +:page-categories: Management, High Availability, Disaster Recovery, Emergency Response +:env-kubernetes: true + +[NOTE] +==== +include::shared:partial$enterprise-license.adoc[] +==== + +This guide provides step-by-step procedures for emergency failover when your primary Redpanda cluster becomes unavailable. Follow these procedures only during active disasters when immediate failover is required. + +[IMPORTANT] +==== +This is an emergency procedure. For planned failover testing or day-to-day shadow link management, see xref:manage:disaster-recovery/shadowing/failover.adoc[]. Ensure you have completed the xref:manage:disaster-recovery/shadowing/overview.adoc#disaster-readiness-checklist[disaster readiness checklist] before an emergency occurs. +==== + +== Emergency failover procedure + +Follow these steps during an active disaster: + +1. <> +2. <> +3. <> +4. <> +5. <> +6. <> +7. <> +8. <> + +[[assess-situation]] +=== Assess the situation + +Confirm that failover is necessary: + +[tabs] +====== +Operator:: ++ +-- +[,bash] +---- +# Check if source cluster is responding +kubectl exec --namespace --container redpanda -- \ + rpk cluster info + +# If source cluster is down, check shadow cluster health +kubectl exec --namespace --container redpanda -- \ + rpk cluster info +---- +-- + +Helm:: ++ +-- +[,bash] +---- +# Check if source cluster is responding +kubectl exec --namespace --container redpanda -- \ + rpk cluster info + +# If source cluster is down, check shadow cluster health +kubectl exec --namespace --container redpanda -- \ + rpk cluster info +---- +-- +====== + +**Decision point**: If the primary cluster is responsive, consider whether failover is actually needed. Partial outages may not require full disaster recovery. + +include::manage:partial$shadowing/failover-decision-examples.adoc[] + +[[verify-shadow-status]] +=== Verify shadow cluster status + +Check the health of your shadow links: + +[tabs] +====== +Operator:: ++ +-- +[,bash] +---- +# List all shadow links +kubectl get shadowlink --namespace + +# Check the ShadowLink resource details +kubectl describe shadowlink --namespace +---- + +Verify that the following conditions exist before proceeding with failover: + +* ShadowLink resource shows `Synced: True` in conditions +* Shadow topic statuses show `state: active` (not `faulted`) +* Task statuses show `state: active` +-- + +Helm:: ++ +-- +[,bash] +---- +# List all shadow links +kubectl exec --namespace --container redpanda -- \ + rpk shadow list + +# Check the configuration of your shadow link +kubectl exec --namespace --container redpanda -- \ + rpk shadow describe + +# Check the status of your disaster recovery link +kubectl exec --namespace --container redpanda -- \ + rpk shadow status +---- + +For detailed command options, see xref:reference:rpk/rpk-shadow/rpk-shadow-list.adoc[`rpk shadow list`], xref:reference:rpk/rpk-shadow/rpk-shadow-describe.adoc[`rpk shadow describe`], and xref:reference:rpk/rpk-shadow/rpk-shadow-status.adoc[`rpk shadow status`]. + +Verify that the following conditions exist before proceeding with failover: + +* Shadow link state should be `ACTIVE` +* Topics should be in `ACTIVE` state (not `FAULTED`) +* Replication lag should be reasonable for your RPO requirements +-- +====== + +==== Understanding replication lag + +Use status commands to check lag, which shows the message count difference between source and shadow partitions: + +include::manage:partial$shadowing/replication-lag-guidelines.adoc[] + +[[document-state]] +=== Document current state + +Record the current lag and status before proceeding: + +[tabs] +====== +Operator:: ++ +-- +[,bash] +---- +# Capture current status for post-mortem analysis +kubectl describe shadowlink --namespace > failover-status-$(date +%Y%m%d-%H%M%S).log +---- +-- + +Helm:: ++ +-- +[,bash] +---- +# Capture current status for post-mortem analysis +kubectl exec --namespace --container redpanda -- \ + rpk shadow status > failover-status-$(date +%Y%m%d-%H%M%S).log +---- +-- +====== + +[IMPORTANT] +==== +Note the replication lag to estimate potential data loss during failover. For details about shadow link replication tasks, see xref:manage:disaster-recovery/shadowing/overview.adoc#shadow-link-tasks[Shadow link tasks]. +==== + +[[initiate-failover]] +=== Initiate failover + +A complete cluster failover is appropriate if you observe that the source cluster is no longer reachable: + +[tabs] +====== +Operator:: ++ +-- +Delete the `ShadowLink` resource to fail over all topics: + +[,bash] +---- +kubectl delete shadowlink --namespace +---- + +.Expected output +[.no-copy] +---- +shadowlink.cluster.redpanda.com "" deleted +---- + +This immediately converts all shadow topics to regular writable topics and stops replication. + +[NOTE] +==== +The Redpanda Operator does not support selective topic failover. For selective failover, use the `rpk` commands shown in the Helm tab. +==== +-- + +Helm:: ++ +-- +For complete cluster failover (all topics): + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk shadow failover --all +---- + +**Expected output**: +[.no-copy] +---- +Successfully initiated the Fail Over for Shadow Link "". To check the status, run: + rpk shadow status +---- + +For selective topic failover (when only specific services are affected): + +[,bash] +---- +# Fail over individual topics +kubectl exec --namespace --container redpanda -- \ + rpk shadow failover --topic +---- + +For detailed command options, see xref:reference:rpk/rpk-shadow/rpk-shadow-failover.adoc[`rpk shadow failover`]. +-- +====== + +[[monitor-progress]] +=== Monitor failover progress + +Track the failover process: + +[tabs] +====== +Operator:: ++ +-- +After deleting the `ShadowLink` resource, verify topics are now writable: + +[,bash] +---- +# Check that shadow link is gone +kubectl get shadowlink --namespace + +# List topics on shadow cluster +kubectl exec --namespace --container redpanda -- \ + rpk topic list + +# Test write to a previously shadow topic +echo "test message" | kubectl exec --namespace --container redpanda -i -- \ + rpk topic produce +---- + +.Expected output for kubectl get +[.no-copy] +---- +No resources found in namespace. +---- + +.Expected output for rpk topic produce +[.no-copy] +---- +Produced to partition 0 at offset 123 with timestamp 1734567890123. +---- + +If the shadow link is deleted and you can successfully produce to topics, failover is complete. +-- + +Helm:: ++ +-- +Monitor status until all topics show `FAILED_OVER`: + +[,bash] +---- +# Monitor status during failover +watch -n 5 "kubectl exec --namespace --container redpanda -- rpk shadow status " + +# Check detailed topic status +kubectl exec --namespace --container redpanda -- \ + rpk shadow status --print-topic +---- + +.Expected output during failover +[.no-copy] +---- +OVERVIEW +=== +NAME disaster-recovery-link +STATE ACTIVE + +TOPICS +=== +Name: orders, State: FAILED_OVER +Name: inventory, State: FAILED_OVER +Name: transactions, State: FAILING_OVER +---- + +Wait for all critical topics to reach `FAILED_OVER` state before proceeding. +-- +====== + +[[update-applications]] +=== Update application configuration + +Redirect your applications to the shadow cluster by updating connection strings in your applications to point to shadow cluster brokers. If using DNS-based service discovery, update DNS records accordingly. Restart applications to pick up new connection settings and verify connectivity from application hosts to shadow cluster. + +[[verify-functionality]] +=== Verify application functionality + +Test critical application workflows: + +[,bash] +---- +# Verify applications can produce messages +echo "failover test" | kubectl exec --namespace --container redpanda -i -- \ + rpk topic produce + +# Verify applications can consume messages +kubectl exec --namespace --container redpanda -- \ + rpk topic consume --num 1 +---- + +.Expected output for produce +[.no-copy] +---- +Produced to partition 0 at offset 456 with timestamp 1734567890456. +---- + +.Expected output for consume +[.no-copy] +---- +{ + "topic": "", + "value": "failover test", + "timestamp": 1734567890456, + "partition": 0, + "offset": 456 +} +---- + +Test message production and consumption, consumer group functionality, and critical business workflows to ensure everything is working properly. + +[[cleanup-stabilize]] +=== Clean up and stabilize + +After all applications are running normally: + +[tabs] +====== +Operator:: ++ +-- +The `ShadowLink` resource has already been deleted during failover. No additional cleanup is needed. +-- + +Helm:: ++ +-- +Optionally delete the shadow link (no longer needed): + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk shadow delete +---- + +For detailed command options, see xref:reference:rpk/rpk-shadow/rpk-shadow-delete.adoc[`rpk shadow delete`]. +-- +====== + +Document the time of failover initiation and completion, applications affected and recovery times, data loss estimates based on replication lag, and issues encountered during failover. + +== Troubleshoot + +include::troubleshoot:partial$errors-and-solutions.adoc[tags=shadow-link-failover] + +include::manage:partial$shadowing/failover-next-steps.adoc[] + +For general failover concepts and procedures, see xref:manage:disaster-recovery/shadowing/failover.adoc[]. diff --git a/modules/manage/pages/kubernetes/shadowing/k-monitor-shadowing.adoc b/modules/manage/pages/kubernetes/shadowing/k-monitor-shadowing.adoc new file mode 100644 index 0000000000..c10e90e5f9 --- /dev/null +++ b/modules/manage/pages/kubernetes/shadowing/k-monitor-shadowing.adoc @@ -0,0 +1,212 @@ += Monitor Kubernetes Shadow Links +:description: Monitor shadow link health in Kubernetes using ShadowLink resources, status commands, metrics, and best practices. +:page-categories: Management, Monitoring, Disaster Recovery +:env-kubernetes: true + +[NOTE] +==== +include::shared:partial$enterprise-license.adoc[] +==== + +Monitor your xref:manage:kubernetes/shadowing/k-shadow-linking.adoc[shadow links] to ensure proper replication performance and understand your disaster recovery readiness. For Kubernetes deployments, you can monitor shadow links using the Redpanda Operator's `ShadowLink` resource status or by using `rpk` commands directly. + +include::manage:partial$shadowing/emergency-shadowing-callout.adoc[] + +== Status commands + +[tabs] +====== +Operator:: ++ +-- +To list existing shadow links: + +[,bash] +---- +kubectl get shadowlink --namespace +---- + +.Example output +[.no-copy] +---- +NAME SYNCED +link True +---- + +A healthy shadow link shows `True` for SYNCED. If SYNCED is `False`, use `kubectl describe` to investigate the issue. + +To view detailed shadow link status and configuration: + +[,bash] +---- +kubectl describe shadowlink --namespace +---- + +.Example output +[.no-copy] +---- +Name: link +Namespace: redpanda-system +API Version: cluster.redpanda.com/v1alpha2 +Kind: ShadowLink +Status: + Conditions: + Status: True + Type: Synced + Message: Shadow link is synced + Shadow Topics: + Name: orders + State: active + Name: inventory + State: active + Tasks: + Name: Source Topic Sync + State: active + Name: Consumer Group Shadowing + State: active + Name: Security Migrator + State: active +---- + +The `kubectl describe` output shows: + +* **Shadow link state**: Overall operational state in the `Status` section +* **Individual topic states**: Current state of each replicated topic under `Shadow Topics` +* **Task status**: Health of replication tasks under `Tasks` +* **Sync status**: Whether the resource is properly synced (`Synced: True` in conditions) +* **Configuration**: Complete shadow link configuration including connection settings and filters + +Look for `Synced: True` in Conditions and `active` state for topics and tasks. + +For more detailed monitoring or troubleshooting, you can also use `rpk` commands as shown in the Helm tab. +-- + +Helm:: ++ +-- +To list existing shadow links: + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk shadow list +---- + +.Example output +[.no-copy] +---- +NAME UID STATE +disaster-recovery-link 70f25b41-9bad-4e31-9f81-d302c8676397 ACTIVE +---- + +To view shadow link configuration details: + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk shadow describe +---- + +For detailed command options, see xref:reference:rpk/rpk-shadow/rpk-shadow-list.adoc[`rpk shadow list`] and xref:reference:rpk/rpk-shadow/rpk-shadow-describe.adoc[`rpk shadow describe`]. This command shows the complete configuration of the shadow link, including connection settings, filters, and synchronization options. + +To check your shadow link status and ensure proper operation: + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk shadow status +---- + +.Example output +[.no-copy] +---- +OVERVIEW +=== +NAME disaster-recovery-link +UID 70f25b41-9bad-4e31-9f81-d302c8676397 +STATE ACTIVE + +TASKS +=== +NAME BROKER_ID SHARD STATE REASON +Source Topic Sync 0 0 ACTIVE Source Topic Sync has started +Consumer Group Shadowing 0 0 ACTIVE Group mirroring task finished successfully +Security Migrator Task 0 0 ACTIVE Security Migrator Task has started + +TOPICS +=== +Name: orders, State: ACTIVE + PARTITION SRC_LSO SRC_HWM DST_HWM LAG + 0 1000 1234 1230 4 + 1 2000 2456 2450 6 + +Name: inventory, State: ACTIVE + PARTITION SRC_LSO SRC_HWM DST_HWM LAG + 0 500 789 789 0 +---- + +Key indicators: + +* **STATE: ACTIVE**: Shadow link is replicating +* **Tasks: ACTIVE**: All replication tasks are running +* **LAG**: Message count difference between source and shadow (lower is better) + +For troubleshooting specific issues, you can use command options to show individual status sections. See xref:reference:rpk/rpk-shadow/rpk-shadow-status.adoc[`rpk shadow status`] for available status options. +-- +====== + +The status output includes the following: + +* **Shadow link state**: Overall operational state (`ACTIVE`, `PAUSED`). +* **Individual topic states**: Current state of each replicated topic (`ACTIVE`, `FAULTED`, `FAILING_OVER`, `FAILED_OVER`, `PAUSED`). +* **Task status**: Health of replication tasks across brokers (`ACTIVE`, `FAULTED`, `NOT_RUNNING`, `LINK_UNAVAILABLE`). For details about shadow link tasks, see xref:manage:disaster-recovery/shadowing/overview.adoc#shadow-link-tasks[Shadow link tasks]. +* **Lag information**: Replication lag per partition showing source vs shadow high watermarks (HWM). + +== Troubleshoot + +include::troubleshoot:partial$errors-and-solutions.adoc[tags=shadow-link-monitoring] + +include::manage:partial$shadowing/shadow-link-metrics.adoc[] + +== Monitoring best practices + +=== Health check procedures + +Establish regular monitoring workflows to ensure shadow link health: + +[tabs] +====== +Operator:: ++ +-- +[,bash] +---- +# Check all shadow links are synced and healthy +kubectl get shadowlink --namespace + +# View detailed status for a specific shadow link +kubectl describe shadowlink --namespace + +# Check for any shadow links with issues (not synced) +kubectl get shadowlink --namespace -o json | \ + jq '.items[] | select(.status.conditions[] | select(.type=="Synced" and .status!="True")) | .metadata.name' +---- +-- + +Helm:: ++ +-- +[,bash] +---- +# Check all shadow links are active +kubectl exec --namespace --container redpanda -- \ + rpk shadow list | grep -v "ACTIVE" || echo "All shadow links healthy" + +# Monitor lag for critical topics +kubectl exec --namespace --container redpanda -- \ + rpk shadow status | grep -E "LAG|Lag" +---- +-- +====== + +include::manage:partial$shadowing/shadow-link-alerts.adoc[] diff --git a/modules/manage/pages/kubernetes/shadowing/k-shadow-linking.adoc b/modules/manage/pages/kubernetes/shadowing/k-shadow-linking.adoc new file mode 100644 index 0000000000..8ec3437e7f --- /dev/null +++ b/modules/manage/pages/kubernetes/shadowing/k-shadow-linking.adoc @@ -0,0 +1,901 @@ += Configure Shadowing in Kubernetes +:description: Set up disaster recovery with Shadowing using the Redpanda Operator or Helm chart. +:page-categories: Management, High Availability, Disaster Recovery +:env-kubernetes: true + +[NOTE] +==== +include::shared:partial$enterprise-license.adoc[] +==== + +Shadowing provides disaster recovery for Redpanda clusters through asynchronous, offset-preserving replication. To set up Shadowing, you create a shadow link and configure filters to select which topics, consumer groups, and ACLs to replicate. + +Redpanda offers two Kubernetes deployment methods with different shadow link management workflows. + +For conceptual information about Shadowing, see xref:manage:disaster-recovery/shadowing/overview.adoc[]. + +== Prerequisites + +=== License and version requirements + +* Both clusters must be running Redpanda v25.3 or later. +* xref:deploy:deployment-option/self-hosted/kubernetes/k-deployment-overview.adoc[Redpanda Operator version 25.3.1 or later] (for Operator deployments). +* xref:deploy:deployment-option/self-hosted/kubernetes/k-deployment-overview.adoc[Redpanda Helm chart version 25.3.1 or later] (for Helm deployments). +* You must have xref:get-started:licensing/overview.adoc[Enterprise Edition] licenses on both clusters. +* If using Redpanda Console, ensure it is running v3.30 or later for managing Shadowing. + +=== Cluster configuration + +include::manage:partial$shadowing/shadow-link-prerequisites.adoc[tag=cluster-property] + +=== Replication service account + +include::manage:partial$shadowing/shadow-link-prerequisites.adoc[tag=replication-permissions] + +=== Network connectivity + +include::manage:partial$shadowing/shadow-link-prerequisites.adoc[tag=network-requirements] + +For Kubernetes-specific networking configuration, see xref:manage:kubernetes/networking/index.adoc[]. + +== Deploy Redpanda clusters + +Deploy both your source and shadow Redpanda clusters with Shadowing enabled. See xref:deploy:deployment-option/self-hosted/kubernetes/k-deployment-overview.adoc[Deploy Redpanda in Kubernetes] for full deployment instructions. + +IMPORTANT: Both clusters must have `enable_shadow_linking: true` in their cluster configuration to support Shadowing. + +To enable Shadowing, set the `enable_shadow_linking` cluster property in your cluster configuration: + +[tabs] +====== +Operator:: ++ +-- +In your `Redpanda` CRD, set: + +[,yaml] +---- +spec: + clusterSpec: + config: + cluster: + enable_shadow_linking: true +---- +-- + +Helm:: ++ +-- +In your Helm values file, set: + +[,yaml] +---- +config: + cluster: + enable_shadow_linking: true +---- +-- +====== + +== Create a shadow link + +In the examples, `` represents the namespace where your shadow Redpanda cluster is deployed, and `` represents the namespace where your source Redpanda cluster is deployed. The shadow cluster and its associated resources (shadow links, secrets) should be deployed in the same namespace as the shadow cluster. + +[tabs] +====== +Operator:: ++ +-- + +. Create a shadow link using the `ShadowLink` CRD. The CRD supports two connection methods. ++ +- For clusters managed by the same operator, use `clusterRef`: ++ +Create a shadow link that references both clusters by name: ++ +.`shadowlink.yaml` +[,yaml,indent=0] +---- +include::manage:example$kubernetes/shadow-links.feature[tags=basic-shadowlink-example] +---- ++ +This example uses example resource names. Replace `redpanda-shadow` and `redpanda-source` with your actual cluster names. ++ +The operator automatically resolves cluster connection details from the referenced `Redpanda` resources. ++ +[NOTE] +==== +When using `clusterRef`, the operator handles authentication automatically using the cluster's internal credentials. For cross-namespace or external clusters, use `staticConfiguration` instead. +==== + +- For external or cross-namespace clusters, use `staticConfiguration`: ++ +Create a shadow link with explicit connection details: ++ +.`shadowlink.yaml` +[,yaml] +---- +apiVersion: cluster.redpanda.com/v1alpha2 +kind: ShadowLink +metadata: + name: disaster-recovery-link +spec: + shadowCluster: + clusterRef: + name: redpanda-shadow + sourceCluster: + staticConfiguration: + kafka: + brokers: + - redpanda-source-0.redpanda-source.source.svc.cluster.local.:9093 + - redpanda-source-1.redpanda-source.source.svc.cluster.local.:9093 + - redpanda-source-2.redpanda-source.source.svc.cluster.local.:9093 + tls: + enabled: true + caCertSecretRef: + name: redpanda-source-default-cert + key: ca.crt + sasl: + mechanism: SCRAM-SHA-512 + username: replication-user + passwordSecretRef: + name: source-cluster-credentials + key: password + topicMetadataSyncOptions: + autoCreateShadowTopicFilters: + - name: '*' + filterType: include + patternType: literal +---- ++ +This example uses example values. Replace the resource names, broker addresses, and credentials with your actual configuration. ++ +With `staticConfiguration`, you must explicitly provide: ++ +* Bootstrap broker addresses +* TLS configuration (if enabled) +* SASL authentication credentials (only when SASL is enabled on the source cluster) +* CA certificates for TLS verification (when TLS is enabled) + +. Apply the ShadowLink resource in the same namespace as your shadow cluster: ++ +[,bash] +---- +kubectl apply --namespace -f shadowlink.yaml +---- +-- + +Helm:: ++ +-- +Create a shadow link using `rpk` commands. This is consistent with how topics, schemas, and users are managed in Helm-based deployments. + +. Create a YAML configuration file for your shadow link: ++ +[,yaml] +---- +# shadow-config.yaml +name: "disaster-recovery-link" +client_options: + bootstrap_servers: + - "redpanda-source-0.redpanda-source.source.svc.cluster.local:9093" + - "redpanda-source-1.redpanda-source.source.svc.cluster.local:9093" + - "redpanda-source-2.redpanda-source.source.svc.cluster.local:9093" + tls_settings: + enabled: true + tls_file_settings: + ca_path: "/etc/tls/certs/default/ca.crt" + authentication_configuration: + scram_configuration: + username: "replication-user" + password: "" + scram_mechanism: SCRAM-SHA-512 + +topic_metadata_sync_options: + interval: "30s" + auto_create_shadow_topic_filters: + - pattern_type: "LITERAL" + filter_type: "INCLUDE" + name: "*" +---- ++ +This example uses example resource names and service addresses. Replace the bootstrap servers, username, and password with your actual source cluster configuration. Replace the `` placeholder with the actual password. ++ +When using TLS with self-signed certificates (the default with `tls.certs.default.caEnabled=true`), the `ca_path` must point to the source cluster's CA certificate. Extract and copy it to the shadow cluster: ++ +[,bash] +---- +# Extract source cluster's CA +kubectl exec --namespace --container redpanda -- \ + cat /etc/tls/certs/default/ca.crt > source-ca.crt + +# Copy to shadow cluster +kubectl cp source-ca.crt /:/tmp/source-ca.crt + +# Reference in config: ca_path: "/tmp/source-ca.crt" +---- ++ +To generate a configuration template with the correct format, use: ++ +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk shadow config generate > shadow-config.yaml +---- ++ +Then edit the generated file with your source cluster details before creating the shadow link. + +. Copy the configuration into a shadow cluster Pod and create the shadow link: ++ +[,bash] +---- +# Copy configuration file into pod +kubectl cp --namespace shadow-config.yaml :/tmp/shadow-config.yaml + +# Create shadow link +kubectl exec --namespace --container redpanda -- \ + rpk shadow create -c /tmp/shadow-config.yaml --no-confirm +---- ++ +For minimal configuration without TLS or authentication (testing only): ++ +[,yaml] +---- +name: "test-link" +client_options: + bootstrap_servers: + - "source-pod.source-namespace.svc.cluster.local:9092" +topic_metadata_sync_options: + interval: "30s" + auto_create_shadow_topic_filters: + - pattern_type: "LITERAL" + filter_type: "INCLUDE" + name: "*" +---- + +Limitations of the Helm approach: + +* Changes require manual `kubectl exec` commands. +* Configuration exists as files copied into Pods. +* Shadow link not visible to `kubectl get`. +* No automatic reconciliation or recovery. +* Cannot be managed by ArgoCD/Flux. +* Must delete and recreate to modify configuration. + +For production deployments requiring declarative configuration and GitOps workflows, consider using the Redpanda Operator. +-- +====== + +== Configure topic filters + +Topic filters determine which source topics are replicated to the shadow cluster. + +[tabs] +====== +Operator:: ++ +-- +Configure filters in the `ShadowLink` resource: + +[,yaml] +---- +spec: + topicMetadataSyncOptions: + autoCreateShadowTopicFilters: + # Include all topics by default + - name: '*' + filterType: include + patternType: literal + + # Exclude temporary or test topics + - name: temp- + filterType: exclude + patternType: prefixed + + - name: test- + filterType: exclude + patternType: prefixed + + # Include specific critical topics + - name: orders + filterType: include + patternType: literal +---- + +Filter evaluation rules: + +- Filters are evaluated in order. +- The first matching filter determines the result. +- If no filters match, the topic is excluded. +- The wildcard `*` matches all topics. + +Pattern types: + +* `literal`: Exact topic name match +* `prefixed`: Matches topics starting with the specified name +-- + +Helm:: ++ +-- +Configure filters in your shadow link configuration file: + +[,yaml] +---- +# shadow-config.yaml +topic_metadata_sync_options: + interval: "30s" + auto_create_shadow_topic_filters: + # Include all topics by default + - pattern_type: "LITERAL" + filter_type: "INCLUDE" + name: "*" + + # Exclude temporary or test topics + - pattern_type: "PREFIX" + filter_type: "EXCLUDE" + name: "temp-" + + - pattern_type: "PREFIX" + filter_type: "EXCLUDE" + name: "test-" + + # Include specific critical topics + - pattern_type: "LITERAL" + filter_type: "INCLUDE" + name: "orders" +---- + +Filter evaluation rules: + +- Filters are evaluated in order. +- The first matching filter determines the result. +- If no filters match, the topic is excluded. +- The wildcard `*` matches all topics. + +Pattern types: + +* `LITERAL`: Exact topic name match +* `PREFIX`: Matches topics starting with the specified name +-- +====== + +Configure starting offset to control where new shadow topics begin replication: + +[tabs] +====== +Operator:: ++ +-- +[,yaml] +---- +spec: + topicMetadataSyncOptions: + # Start from the earliest available offset (default) + startAtEarliest: {} + + # Or start from the latest offset + # startAtLatest: {} + + # Or start from a specific timestamp + # startAtTimestamp: + # timestamp: "2024-12-01T00:00:00Z" +---- +-- + +Helm:: ++ +-- +[,yaml] +---- +# shadow-config.yaml +topic_metadata_sync_options: + # Start from the earliest available offset (default) + start_at_earliest: {} + + # Or start from the latest offset + # start_at_latest: {} + + # Or start from a specific timestamp + # start_at_timestamp: + # timestamp: "2024-12-01T00:00:00Z" +---- +-- +====== + +== Configure consumer offset synchronization + +Enable consumer offset replication so consumers can resume from the same position after failover: + +[tabs] +====== +Operator:: ++ +-- +[,yaml] +---- +spec: + consumerOffsetSyncOptions: + enabled: true + interval: 30s + groupFilters: + - name: '*' + filterType: include + patternType: literal + - name: debug-consumer + filterType: exclude + patternType: literal +---- +-- + +Helm:: ++ +-- +[,yaml] +---- +# shadow-config.yaml +consumer_offset_sync_options: + enabled: true + interval: "30s" + group_filters: + - pattern_type: "LITERAL" + filter_type: "INCLUDE" + name: "*" + - pattern_type: "LITERAL" + filter_type: "EXCLUDE" + name: "debug-consumer" +---- +-- +====== + +== Configure ACL synchronization + +Replicate access control lists to maintain security policies on the shadow cluster: + +[tabs] +====== +Operator:: ++ +-- +[,yaml] +---- +spec: + aclSyncOptions: + enabled: true + interval: 60s + aclFilters: + - resourceType: TOPIC + resourcePatternType: LITERAL + operation: ALL + permissionType: ALLOW +---- +-- + +Helm:: ++ +-- +[,yaml] +---- +# shadow-config.yaml +security_sync_options: + enabled: true + interval: "60s" + acl_filters: + - resource_filter: + resource_type: "TOPIC" + pattern_type: "LITERAL" + access_filter: + operation: "ALL" + permission_type: "ALLOW" +---- +-- +====== + +== Verify shadow link + +[tabs] +====== +Operator:: ++ +-- +Check the status of your shadow link: + +[,bash] +---- +kubectl get shadowlink --namespace -o yaml +---- + +The status section shows replication details: + +[,yaml] +---- +status: + conditions: + - type: Ready + status: "True" + lastTransitionTime: "2024-12-10T10:00:00Z" + reason: ReconciliationSucceeded + message: Shadow link is active and replicating + observedGeneration: 1 +---- + +Verify replication: + +[,bash] +---- +# List topics on shadow cluster +kubectl exec --namespace --container redpanda -- \ + rpk topic list + +# Check shadow link status +kubectl exec --namespace --container redpanda -- \ + rpk shadow status +---- +-- + +Helm:: ++ +-- +List shadow links: + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk shadow list +---- + +Check shadow link status: + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk shadow status + +kubectl exec --namespace --container redpanda -- \ + rpk shadow describe +---- + +List replicated topics: + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk topic list +---- +-- +====== + +=== Test replication + +Produce data to the source cluster and verify it appears on the shadow cluster: + +[tabs] +====== +Operator:: ++ +-- +[,bash] +---- +# Produce to source cluster +kubectl exec --namespace --container redpanda -- \ + rpk topic produce --key + +# Consume from shadow cluster +kubectl exec --namespace --container redpanda -- \ + rpk topic consume +---- +-- + +Helm:: ++ +-- +[,bash] +---- +# Produce to source cluster +kubectl exec --namespace --container redpanda -- \ + rpk topic produce --key + +# Consume from shadow cluster +kubectl exec --namespace --container redpanda -- \ + rpk topic consume +---- +-- +====== + +[NOTE] +==== +Shadow topics are read-only. Attempting to produce or delete topics on the shadow cluster will fail while the shadow link is active. +==== + +== Update a shadow link + +[tabs] +====== +Operator:: ++ +-- +Update the `ShadowLink` resource: + +[,bash] +---- +kubectl edit shadowlink --namespace +---- + +Or apply an updated manifest: + +[,bash] +---- +kubectl apply --namespace -f shadowlink-updated.yaml +---- + +The operator automatically reconciles the changes. Common updates include: + +* Adding or removing topic filters +* Adjusting synchronization intervals +* Enabling or disabling consumer offset/ACL sync +* Updating authentication credentials +-- + +Helm:: ++ +-- +To update a shadow link configuration: + +[,bash] +---- +# Delete existing shadow link +kubectl exec --namespace --container redpanda -- \ + rpk shadow delete + +# Copy updated configuration +kubectl cp --namespace shadow-config-updated.yaml :/tmp/shadow-config.yaml + +# Recreate shadow link +kubectl exec --namespace --container redpanda -- \ + rpk shadow create -c /tmp/shadow-config.yaml +---- + +[WARNING] +==== +Deleting and recreating a shadow link causes a brief interruption in replication. Plan updates during maintenance windows. +==== +-- +====== + +== Delete a shadow link + +[tabs] +====== +Operator:: ++ +-- +Delete the `ShadowLink` resource: + +[,bash] +---- +kubectl delete shadowlink --namespace +---- + +-- + +Helm:: ++ +-- +Delete the shadow link using `rpk`: + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk shadow delete +---- + +-- +====== + +[IMPORTANT] +==== +After deleting a shadow link: + +* Shadow topics remain on the cluster as regular topics. +* The topics are no longer read-only and can be written to. +* Replication from the source cluster stops immediately. +* Consumer offset and ACL synchronization stops. + +This is the first step in a disaster recovery failover scenario. +==== + +== Failover procedure + +In a disaster scenario, follow these steps to failover to the shadow cluster: + +[tabs] +====== +Operator:: ++ +-- +. **Delete the shadow link**: ++ +[,bash] +---- +kubectl delete shadowlink --namespace +---- + +. **Verify topics are writable**: ++ +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk topic produce --key test +---- + +. **Update client configurations** to point to the shadow cluster endpoints + +. **Verify consumer groups** can resume from their last committed offsets: ++ +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk group describe +---- +-- + +Helm:: ++ +-- +. **Delete the shadow link**: ++ +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk shadow delete +---- + +. **Verify topics are writable**: ++ +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk topic produce --key test +---- + +. **Update client configurations** to point to the shadow cluster endpoints + +. **Verify consumer groups** can resume from their last committed offsets: ++ +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk group describe +---- +-- +====== + +For detailed failover procedures and best practices, see xref:manage:disaster-recovery/shadowing/failover.adoc[]. For emergency failover procedures, see xref:manage:kubernetes/shadowing/k-failover-runbook.adoc[]. + +== Troubleshoot + +=== Shadowing not working + +[tabs] +====== +Operator:: ++ +-- +Check the operator logs: + +[,bash] +---- +kubectl logs --namespace -l app.kubernetes.io/name=operator --tail=100 +---- + +Verify the operator has shadow link support enabled: + +[,bash] +---- +kubectl get deployment --namespace -o yaml | grep enable-shadowlinks +---- + +Check the `ShadowLink` resource status: + +[,bash] +---- +kubectl describe shadowlink --namespace +---- +-- + +Helm:: ++ +-- +Verify shadow linking is enabled on both clusters: + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk cluster config get enable_shadow_linking + +kubectl exec --namespace --container redpanda -- \ + rpk cluster config get enable_shadow_linking +---- + +Check shadow link status for errors: + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk shadow describe +---- +-- +====== + +=== Connection errors + +Verify network connectivity: + +[,bash] +---- +# Test from shadow cluster pod +kubectl exec --namespace --container redpanda -- \ + rpk cluster info -X brokers=:9093 \ + -X tls.enabled=true \ + -X sasl.mechanism=SCRAM-SHA-512 \ + -X user= \ + -X pass= +---- + +Check that secrets exist and contain correct values: + +[,bash] +---- +kubectl get secret --namespace +kubectl get secret --namespace -o jsonpath='{.data.password}' | base64 -d +---- + +=== Replication lag + +Monitor replication lag: + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk shadow status --detailed +---- + +High replication lag can be caused by: + +* Network bandwidth limitations between clusters +* High write throughput on the source cluster +* Resource constraints on the shadow cluster + +Consider adjusting: + +* Shadow cluster resources (CPU, memory) +* Network bandwidth between regions +* Replication batch sizes and intervals + +=== Authentication failures + +Verify SASL credentials: + +[,bash] +---- +# Check if user exists on source cluster +kubectl exec -it -n source redpanda-source-0 -c redpanda -- \ + rpk acl user list +---- + +Ensure the replication user has required ACL permissions: + +[,bash] +---- +kubectl exec -it -n source redpanda-source-0 -c redpanda -- \ + rpk acl list --principal User:replication-user +---- + +== Related topics + +* xref:manage:disaster-recovery/shadowing/overview.adoc[Shadow Linking Overview] +* xref:manage:disaster-recovery/shadowing/setup.adoc[Configure Shadow Linking with rpk] +* xref:manage:disaster-recovery/shadowing/monitor.adoc[Monitor Shadow Links] +* xref:manage:disaster-recovery/shadowing/failover.adoc[Shadow Link Failover] +* xref:reference:k-operator-helm-spec.adoc[Redpanda Operator Helm Configuration] +* xref:manage:kubernetes/security/authentication/k-user-controller.adoc[Manage Users with Kubernetes] diff --git a/modules/shared/partials/emergency-shadowing-callout.adoc b/modules/manage/partials/shadowing/emergency-shadowing-callout.adoc similarity index 55% rename from modules/shared/partials/emergency-shadowing-callout.adoc rename to modules/manage/partials/shadowing/emergency-shadowing-callout.adoc index c853f01070..3320e7e70c 100644 --- a/modules/shared/partials/emergency-shadowing-callout.adoc +++ b/modules/manage/partials/shadowing/emergency-shadowing-callout.adoc @@ -1,6 +1,11 @@ :important-caption: Experiencing an active disaster? [IMPORTANT] ==== +ifdef::env-kubernetes[] +See xref:manage:kubernetes/shadowing/k-failover-runbook.adoc[] for immediate step-by-step disaster procedures. +endif::[] +ifndef::env-kubernetes[] See xref:manage:disaster-recovery/shadowing/failover-runbook.adoc[] for immediate step-by-step disaster procedures. +endif::[] ==== :important-caption: Important \ No newline at end of file diff --git a/modules/manage/partials/shadowing/failover-behavior.adoc b/modules/manage/partials/shadowing/failover-behavior.adoc new file mode 100644 index 0000000000..d49981b23f --- /dev/null +++ b/modules/manage/partials/shadowing/failover-behavior.adoc @@ -0,0 +1,11 @@ +== Failover behavior + +When you initiate failover, Redpanda performs the following operations: + +1. **Stops replication**: Halts all data fetching from the source cluster for the specified topics or entire shadow link +2. **Failover topics**: Converts read-only shadow topics into regular, writable topics +3. **Updates topic state**: Changes topic status from `ACTIVE` to `FAILING_OVER`, then `FAILED_OVER` + +Topic failover is irreversible. Once failed over, topics cannot return to shadow mode, and automatic fallback to the original source cluster is not supported. + +NOTE: To avoid a split-brain scenario after failover, ensure that all clients are reconfigured to point to the shadow cluster before resuming write activity. diff --git a/modules/manage/partials/shadowing/failover-decision-examples.adoc b/modules/manage/partials/shadowing/failover-decision-examples.adoc new file mode 100644 index 0000000000..2f2ecc5b83 --- /dev/null +++ b/modules/manage/partials/shadowing/failover-decision-examples.adoc @@ -0,0 +1,13 @@ +**Examples that require full failover:** + +* Primary cluster is completely unreachable (network partition, regional outage) +* Multiple broker failures preventing writes to critical topics +* Data center failure affecting majority of brokers +* Persistent authentication or authorization failures across the cluster + +**Examples that may NOT require failover:** + +* Single broker failure with sufficient replicas remaining +* Temporary network connectivity issues affecting some clients +* High latency or performance degradation (but cluster still functional) +* Non-critical topic or partition unavailability diff --git a/modules/manage/partials/shadowing/failover-next-steps.adoc b/modules/manage/partials/shadowing/failover-next-steps.adoc new file mode 100644 index 0000000000..3180b2330f --- /dev/null +++ b/modules/manage/partials/shadowing/failover-next-steps.adoc @@ -0,0 +1,17 @@ +== Next steps + +After successful failover, focus on recovery planning and process improvement. Begin by assessing the source cluster failure and determining whether to restore the original cluster or permanently promote the shadow cluster as your new primary. + +**Immediate recovery planning:** + +1. **Assess source cluster**: Determine root cause of the outage +2. **Plan recovery**: Decide whether to restore source cluster or promote shadow cluster permanently +3. **Data synchronization**: Plan how to synchronize any data produced during failover +4. **Fail forward**: Create a new shadow link with the failed over shadow cluster as source to maintain a DR cluster + +**Process improvement:** + +1. **Document the incident**: Record timeline, impact, and lessons learned +2. **Update runbooks**: Improve procedures based on what you learned +3. **Test regularly**: Schedule regular disaster recovery drills +4. **Review monitoring**: Ensure monitoring caught the issue appropriately diff --git a/modules/manage/partials/shadowing/failover-states.adoc b/modules/manage/partials/shadowing/failover-states.adoc new file mode 100644 index 0000000000..0b28e4ce58 --- /dev/null +++ b/modules/manage/partials/shadowing/failover-states.adoc @@ -0,0 +1,20 @@ +== Failover states + +=== Shadow link states + +The shadow link itself has a simple state model: + +* **`ACTIVE`**: Shadow link is operating normally, replicating data +* **`PAUSED`**: Shadow link replication is temporarily halted by user action + +Shadow links do not have dedicated failover states. Instead, the link's operational status is determined by the collective state of its shadow topics. + +=== Shadow topic states + +Individual shadow topics progress through specific states during failover: + +* **`ACTIVE`**: Normal replication state before failover +* **`FAULTED`**: Shadow topic has encountered an error and is not replicating +* **`FAILING_OVER`**: Failover initiated, replication stopping +* **`FAILED_OVER`**: Failover completed successfully, topic fully writable +* **`PAUSED`**: Replication temporarily halted by user action diff --git a/modules/manage/partials/shadowing/replication-lag-guidelines.adoc b/modules/manage/partials/shadowing/replication-lag-guidelines.adoc new file mode 100644 index 0000000000..5c0f2a0545 --- /dev/null +++ b/modules/manage/partials/shadowing/replication-lag-guidelines.adoc @@ -0,0 +1,3 @@ +* **Acceptable lag examples**: 0-1000 messages for low-throughput topics, 0-10000 messages for high-throughput topics +* **Concerning lag examples**: Growing lag over 50,000 messages, or lag that continuously increases without recovering +* **Critical lag examples**: Lag exceeding your data loss tolerance (for example, if you can only afford to lose 1 minute of data, lag should represent less than 1 minute of typical message volume) diff --git a/modules/manage/partials/shadowing/shadow-link-alerts.adoc b/modules/manage/partials/shadowing/shadow-link-alerts.adoc new file mode 100644 index 0000000000..20922285e4 --- /dev/null +++ b/modules/manage/partials/shadowing/shadow-link-alerts.adoc @@ -0,0 +1,12 @@ +=== Alert conditions + +Configure monitoring alerts for the following conditions, which indicate problems with Shadowing: + +* **High replication lag**: When `redpanda_shadow_link_shadow_lag` exceeds your RPO requirements +* **Connection errors**: When `redpanda_shadow_link_client_errors` increases rapidly +* **Topic state changes**: When topics move to `FAULTED` state +* **Task failures**: When replication tasks enter `FAULTED` or `NOT_RUNNING` states +* **Throughput drops**: When bytes/records fetched drops significantly +* **Link unavailability**: When tasks show `LINK_UNAVAILABLE` indicating source cluster connectivity issues + +For more information about shadow link tasks and their states, see xref:manage:disaster-recovery/shadowing/overview.adoc#shadow-link-tasks[Shadow link tasks]. diff --git a/modules/manage/partials/shadowing/shadow-link-metrics.adoc b/modules/manage/partials/shadowing/shadow-link-metrics.adoc new file mode 100644 index 0000000000..11f651edd5 --- /dev/null +++ b/modules/manage/partials/shadowing/shadow-link-metrics.adoc @@ -0,0 +1,39 @@ +[[shadow-link-metrics]] +== Metrics + +Shadowing provides comprehensive metrics to track replication performance and health with the xref:reference:public-metrics-reference.adoc[`public_metrics`] endpoint. + +[cols="1,1,2"] +|=== +|Metric |Type |Description + +|`redpanda_shadow_link_shadow_lag` +|Gauge +|The lag of the shadow partition against the source partition, calculated as source partition LSO (Last Stable Offset) minus shadow partition HWM (High Watermark). Monitor by `shadow_link_name`, `topic`, and `partition` to understand replication lag for each partition. + +|`redpanda_shadow_link_total_bytes_fetched` +|Count +|The total number of bytes fetched by a sharded replicator (bytes received by the client). Labeled by `shadow_link_name` and `shard` to track data transfer volume from the source cluster. + +|`redpanda_shadow_link_total_bytes_written` +|Count +|The total number of bytes written by a sharded replicator (bytes written to the write_at_offset_stm). Uses `shadow_link_name` and `shard` labels to monitor data written to the shadow cluster. + +|`redpanda_shadow_link_client_errors` +|Count +|The number of errors seen by the client. Track by `shadow_link_name` and `shard` to identify connection or protocol issues between clusters. + +|`redpanda_shadow_link_shadow_topic_state` +|Gauge +|Number of shadow topics in the respective states. Labeled by `shadow_link_name` and `state` to monitor topic state distribution across your shadow links. + +|`redpanda_shadow_link_total_records_fetched` +|Count +|The total number of records fetched by the sharded replicator (records received by the client). Monitor by `shadow_link_name` and `shard` to track message throughput from the source. + +|`redpanda_shadow_link_total_records_written` +|Count +|The total number of records written by a sharded replicator (records written to the write_at_offset_stm). Uses `shadow_link_name` and `shard` labels to monitor message throughput to the shadow cluster. +|=== + +See also: xref:reference:public-metrics-reference.adoc[] diff --git a/modules/manage/partials/shadowing/shadow-link-prerequisites.adoc b/modules/manage/partials/shadowing/shadow-link-prerequisites.adoc new file mode 100644 index 0000000000..64bb7c8c03 --- /dev/null +++ b/modules/manage/partials/shadowing/shadow-link-prerequisites.adoc @@ -0,0 +1,128 @@ +// tag::cluster-property[] +Both source and shadow clusters must have the xref:reference:properties/cluster-properties.adoc#enable_shadow_linking[`enable_shadow_linking`] cluster property set to `true`. + +ifdef::env-kubernetes[] +[tabs] +====== +Operator:: ++ +-- +Set this property in your Redpanda custom resource: + +[,yaml] +---- +apiVersion: cluster.redpanda.com/v1alpha2 +kind: Redpanda +metadata: + name: redpanda +spec: + clusterSpec: + config: + cluster: + enable_shadow_linking: true +---- +-- + +Helm:: ++ +-- +Set this property in your Helm values file: + +[,yaml] +---- +config: + cluster: + enable_shadow_linking: true +---- +-- +====== +endif::[] + +ifndef::env-kubernetes[] +To enable this property, run: +[,bash] +---- +rpk cluster config set enable_shadow_linking true +---- + +[NOTE] +==== +This cluster property must be configured using `rpk` or the Admin API v1 before you can create shadow links through any interface. +==== + +To learn more about configuring cluster properties, see xref:manage:cluster-maintenance/cluster-property-configuration.adoc[]. +endif::[] +// end::cluster-property[] + +// tag::replication-permissions[] +A service account (SASL user) on the source cluster is required for shadow link replication only when SASL authentication is enabled on the source cluster. + +When SASL authentication is disabled on the source cluster, no service account credentials are required for shadow link setup. + +When SASL authentication is enabled, the service account must have the following xref:manage:security/authorization/acl.adoc[ACL] permissions: + +[[acl]] +* **Topics**: `read` permission on all topics you want to replicate +* **Topic configurations**: `describe_configs` permission on topics for configuration synchronization +* **Consumer groups**: `describe` and `read` permission on consumer groups for offset replication +* **ACLs**: `describe` permission on ACL resources to replicate security policies +* **Cluster**: `describe` permission on the cluster resource to access ACLs + +This service account authenticates from the shadow cluster to the source cluster and performs the data replication. + +ifdef::env-kubernetes[] +[tabs] +====== +Operator:: ++ +-- +When using `clusterRef` to connect to a source cluster managed by the same operator, authentication is handled automatically. The operator creates a `kubernetes-controller` user on both clusters when SASL is enabled. + +When using `staticConfiguration` to connect to an external source cluster with SASL enabled, you must provide credentials for a service account that exists on the source cluster. Create the service account using the `User` CRD (see xref:manage:kubernetes/security/authentication/k-user-controller.adoc[]). +-- + +Helm:: ++ +-- +Create the replication service account on the source cluster using one of these methods: + +In your Helm values file: + +[,yaml] +---- +auth: + sasl: + enabled: true + users: + - name: replication-user + password: + mechanism: SCRAM-SHA-512 +---- + +Or using `rpk` after deployment: + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk security user create replication-user \ + --password \ + --mechanism SCRAM-SHA-512 +---- + +Then configure the <>. +-- +====== +endif::[] +// end::replication-permissions[] + +// tag::network-requirements[] +You must configure network connectivity between clusters with appropriate firewall rules to allow the shadow cluster to connect to the source cluster for data replication. Shadowing uses a pull-based architecture where the shadow cluster fetches data from the source cluster. + +ifdef::env-kubernetes[] +In Kubernetes, ensure: + +* The shadow cluster can reach the source cluster's Kafka API endpoints. This may involve configuring Kubernetes NetworkPolicies, Services, or Ingress resources. +* If using TLS, the shadow cluster has access to the source cluster's CA certificate. +* Network policies allow egress from the shadow cluster to the source cluster. +endif::[] +// end::network-requirements[] diff --git a/modules/reference/pages/k-crd.adoc b/modules/reference/pages/k-crd.adoc index 542d49b0d3..4e93111203 100644 --- a/modules/reference/pages/k-crd.adoc +++ b/modules/reference/pages/k-crd.adoc @@ -11,6 +11,7 @@ - xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-redpanda[$$Redpanda$$] - xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-redpandarole[$$RedpandaRole$$] - xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-schema[$$Schema$$] +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlink[$$ShadowLink$$] - xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-topic[$$Topic$$] - xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-user[$$User$$] @@ -19,7 +20,7 @@ [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-aclaccessfilter"] == ACLAccessFilter -Filter an ACL based on its access type, operation, principal, and host. +Filter an ACL based on its access @@ -39,6 +40,22 @@ all principals with the specified `operation` and `permissionType` + |=== +[id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-aclfilter"] +== ACLFilter + +A filter for ACLs + + + +.Appears in: +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinksecuritysettingssyncoptions[$$ShadowLinkSecuritySettingsSyncOptions$$] + +[cols="25a,75a", options="header"] +|=== +| Field | Description +| *`accessFilter`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-aclaccessfilter[$$ACLAccessFilter$$]__ | The access filter + +| *`resourceFilter`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-aclresourcefilter[$$ACLResourceFilter$$]__ | The resource filter + +|=== [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-acloperation"] @@ -255,7 +272,7 @@ Redpanda will use the `internal_topic_replication_factor` cluster config value. [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-auth"] == Auth -Auth configures authentication in the Helm values. See https://docs.redpanda.com/current/manage/kubernetes/security/authentication/sasl-kubernetes/. +Auth configures authentication in the Helm values. See xref:manage:kubernetes/security/authentication/sasl-kubernetes.adoc[] @@ -364,7 +381,7 @@ Budget configures the management of disruptions affecting the Pods in the Statef [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-cpu"] == CPU -CPU configures CPU resources for containers. See https://docs.redpanda.com/current/manage/kubernetes/manage-resources/. +CPU configures CPU resources for containers. See xref:manage:kubernetes/manage-resources.adoc[] @@ -374,7 +391,7 @@ CPU configures CPU resources for containers. See https://docs.redpanda.com/curre [cols="25a,75a", options="header"] |=== | Field | Description -| *`cores`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#quantity-resource-api[$$Quantity$$]__ | Specifies the number of CPU cores available to the application. Redpanda makes use of a thread per core model. For details, see https://docs.redpanda.com/current/get-started/architecture/#thread-per-core-model. For this reason, Redpanda should only be given full cores. Note: You can increase cores, but decreasing cores is not currently supported. See the GitHub issue:https://github.com/redpanda-data/redpanda/issues/350. This setting is equivalent to `--smp`, `resources.requests.cpu`, and `resources.limits.cpu`. For production, use `4` or greater. + +| *`cores`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#quantity-resource-api[$$Quantity$$]__ | Specifies the number of CPU cores available to the application. Redpanda makes use of a thread per core model. For details, see xref:get-started:architecture.adoc[] For this reason, Redpanda should only be given full cores. Note: You can increase cores, but decreasing cores is not currently supported. See the GitHub issue:https://github.com/redpanda-data/redpanda/issues/350. This setting is equivalent to `--smp`, `resources.requests.cpu`, and `resources.limits.cpu`. For production, use `4` or greater. + | *`overprovisioned`* __boolean__ | Specifies whether Redpanda assumes it has all of the provisioned CPU. This should be `true` unless the container has CPU affinity. Equivalent to: `--idle-poll-time-us 0`, `--thread-affinity 0`, and `--poll-aio 0`. If the value of full cores in `resources.cpu.cores` is less than `1`, this setting is set to `true`. + |=== @@ -490,6 +507,7 @@ ClusterSource defines how to connect to a particular Redpanda cluster. - xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-consolespec[$$ConsoleSpec$$] - xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-rolespec[$$RoleSpec$$] - xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-schemaspec[$$SchemaSpec$$] +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkspec[$$ShadowLinkSpec$$] - xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-topicspec[$$TopicSpec$$] - xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-userspec[$$UserSpec$$] @@ -560,15 +578,15 @@ Config configures Redpanda config properties supported by Redpanda that may not [cols="25a,75a", options="header"] |=== | Field | Description -| *`rpk`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Specifies cluster configuration properties. See https://docs.redpanda.com/current/reference/cluster-properties/. + -| *`cluster`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Specifies cluster configuration properties. See https://docs.redpanda.com/current/reference/cluster-properties/. + +| *`rpk`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Specifies cluster configuration properties. See xref:reference:cluster-properties.adoc[] + +| *`cluster`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Specifies cluster configuration properties. See xref:reference:cluster-properties.adoc[] + | *`extraClusterConfiguration`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-clusterconfiguration[$$ClusterConfiguration$$]__ | Holds values (or references to values) that should be used to configure the cluster; these + are resolved late in order to avoid embedding secrets directly into bootstrap configurations + exposed as Kubernetes configmaps. + -| *`node`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Specifies broker configuration properties. See https://docs.redpanda.com/current/reference/node-properties/. + -| *`tunable`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Specifies tunable configuration properties. See https://docs.redpanda.com/current/reference/tunable-properties/. + -| *`schema_registry_client`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Specifies tunable configuration properties. See https://docs.redpanda.com/current/reference/tunable-properties/. + -| *`pandaproxy_client`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Specifies tunable configuration properties. See https://docs.redpanda.com/current/reference/tunable-properties/. + +| *`node`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Specifies broker configuration properties. See xref:reference:node-properties.adoc[] + +| *`tunable`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Specifies tunable configuration properties. See xref:reference:tunable-properties.adoc[] + +| *`schema_registry_client`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Specifies tunable configuration properties. See xref:reference:tunable-properties.adoc[] + +| *`pandaproxy_client`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Specifies tunable configuration properties. See xref:reference:tunable-properties.adoc[] + |=== @@ -676,7 +694,7 @@ the dynamic configuration, while its "synonym" will be the default. + [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-connectormonitoring"] == ConnectorMonitoring -ConnectorMonitoring configures monitoring resources for Connectors. See https://docs.redpanda.com/current/manage/kubernetes/monitoring/monitor-redpanda/. +ConnectorMonitoring configures monitoring resources for Connectors. See xref:manage:kubernetes/monitoring/monitor-redpanda.adoc[] @@ -957,28 +975,10 @@ CredentialSecretRef can be used to set cloud_storage_secret_key from referenced |=== -[id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-enablable"] -== Enablable - - - - - -.Appears in: -- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-redpandaclusterspec[$$RedpandaClusterSpec$$] -- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-redpandaconsole[$$RedpandaConsole$$] - -[cols="25a,75a", options="header"] -|=== -| Field | Description -| *`enabled`* __boolean__ | -|=== - - [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-enterprise"] == Enterprise -Enterprise configures an Enterprise license key to enable Redpanda Enterprise features. Requires the post-install job to be enabled (default). See https://docs.redpanda.com/current/get-started/licenses/. +Enterprise configures an Enterprise license key to enable Redpanda Enterprise features. Requires the post-install job to be enabled (default). See xref:get-started:licenses.adoc[] @@ -1579,7 +1579,7 @@ specified, this field takes precedence over [Certificate.CAEnabled]. + [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-listeners"] == Listeners -Listeners configures settings for listeners, including HTTP Proxy, Schema Registry, the Admin API and the Kafka API. See https://docs.redpanda.com/current/manage/kubernetes/networking/configure-listeners/. +Listeners configures settings for listeners, including HTTP Proxy, Schema Registry, the Admin API and the Kafka API. See xref:manage:kubernetes/networking/configure-listeners.adoc[] @@ -1622,7 +1622,7 @@ LivenessProbe configures liveness probes to monitor the health of the Pods and r [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-logging"] == Logging -Logging configures logging settings in the Helm values. See https://docs.redpanda.com/current/manage/kubernetes/troubleshooting/troubleshoot/. +Logging configures logging settings in the Helm values. See xref:manage:kubernetes/troubleshooting/troubleshoot.adoc[] @@ -1677,7 +1677,7 @@ MetadataTemplate defines additional metadata to associate with a resource. [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-monitoring"] == Monitoring -Monitoring configures monitoring resources for Redpanda. See https://docs.redpanda.com/current/manage/kubernetes/monitoring/monitor-redpanda/. +Monitoring configures monitoring resources for Redpanda. See xref:manage:kubernetes/monitoring/monitor-redpanda.adoc[] @@ -1695,6 +1695,30 @@ Monitoring configures monitoring resources for Redpanda. See https://docs.redpan |=== +[id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-namefilter"] +== NameFilter + +A filter based on the name of a resource + + + +.Appears in: +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkconsumeroffsetsyncoptions[$$ShadowLinkConsumerOffsetSyncOptions$$] +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinktopicmetadatasyncoptions[$$ShadowLinkTopicMetadataSyncOptions$$] + +[cols="25a,75a", options="header"] +|=== +| Field | Description +| *`name`* __string__ | The resource name, or "*" + +Note if the wildcar "*" is used it must be the _only_ character + +and `patternType` must be `literal` + +| *`filterType`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-filtertype[$$FilterType$$]__ | Valid values: + +- include + +- exclude + +| *`patternType`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-patterntype[$$PatternType$$]__ | Default value is literal. Valid values: + +- literal + +- prefixed + +|=== [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-oidcloginsecrets"] @@ -2035,7 +2059,7 @@ DEPRECATED: Use sideCars.securityContext + [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-rackawareness"] == RackAwareness -RackAwareness configures rack awareness in the Helm values. See https://docs.redpanda.com/current/manage/kubernetes/kubernetes-rack-awareness/. +RackAwareness configures rack awareness in the Helm values. See xref:manage:kubernetes/kubernetes-rack-awareness.adoc[] @@ -2125,7 +2149,7 @@ More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api- [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-redpandaclusterspec"] == RedpandaClusterSpec -RedpandaClusterSpec defines the desired state of a Redpanda cluster. These settings are the same as those defined in the Redpanda Helm chart. The values in these settings are passed to the Redpanda Helm chart through Flux. For all default values and links to more documentation, see https://docs.redpanda.com/current/reference/redpanda-helm-spec/. +RedpandaClusterSpec defines the desired state of a Redpanda cluster. These settings are the same as those defined in the Redpanda Helm chart. The values in these settings are passed to the Redpanda Helm chart through Flux. For all default values and links to more documentation, see xref:reference:redpanda-helm-spec.adoc[] For descriptions and default values, see xref:k-redpanda-helm-spec.adoc[]. @@ -2146,8 +2170,6 @@ For descriptions and default values, see xref:k-redpanda-helm-spec.adoc[]. | *`tolerations`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#toleration-v1-core[$$Toleration$$] array__ | Specifies tolerations to allow Pods to be scheduled onto nodes where they otherwise wouldn’t. + | *`image`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-redpandaimage[$$RedpandaImage$$]__ | Defines the container image settings to use for the Redpanda cluster. + | *`imagePullSecrets`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#localobjectreference-v1-core[$$LocalObjectReference$$] array__ | Specifies credentials for a private image repository. For details, see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/. + -| *`license_key`* __string__ | Deprecated: Use `Enterprise` instead. + -| *`license_secret_ref`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-licensesecretref[$$LicenseSecretRef$$]__ | Deprecated: Use `EnterpriseLicenseSecretRef` instead. + | *`enterprise`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-enterprise[$$Enterprise$$]__ | Defines an Enterprise license. + | *`rackAwareness`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-rackawareness[$$RackAwareness$$]__ | Defines rack awareness settings. + | *`console`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-redpandaconsole[$$RedpandaConsole$$]__ | Defines Redpanda Console settings. + @@ -2176,14 +2198,15 @@ Setting `force` to `true` will result in a short period of downtime. + | *`affinity`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#affinity-v1-core[$$Affinity$$]__ | Affinity constraints for scheduling Pods, can override this for + StatefulSets and Jobs. For details, see the [Kubernetes + documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity). + -| *`tests`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-enablable[$$Enablable$$]__ | +| *`license_key`* __string__ | Deprecated: Use `enterprise.license` instead. + +| *`license_secret_ref`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-licensesecretref[$$LicenseSecretRef$$]__ | Deprecated: Use `enterprise.licenseSecretRef` instead. + |=== [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-redpandaconnectors"] == RedpandaConnectors -RedpandaConnectors configures Redpanda Connectors. Redpanda Connectors is a package that includes Kafka Connect and built-in connectors, sometimes known as plugins. See https://docs.redpanda.com/current/deploy/deployment-option/self-hosted/kubernetes/k-deploy-connectors/. +RedpandaConnectors configures Redpanda Connectors. Redpanda Connectors is a package that includes Kafka Connect and built-in connectors, sometimes known as plugins. See xref:deploy:deployment-option/self-hosted/kubernetes/k-deploy-connectors.adoc[] @@ -2263,20 +2286,19 @@ never used. Prefer ConfigMap (configmap). + | *`configMap`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-consolecreateobj[$$ConsoleCreateObj$$]__ | Specifies whether a ConfigMap should be created for Redpanda Console. + | *`secret`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Specifies whether a Secret should be created for Redpanda Console. + | *`deployment`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Specifies whether a Deployment should be created for Redpanda Console. + -| *`console`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Deprecated: Use `config` instead + -`console` is available in Console chart version earlier or equal to v0.7.31 + | *`config`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Configures custom settings for Redpanda Console. + `config` is available in Console chart version after v0.7.31 semver + | *`strategy`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Configures console's Deployment's update strategy. + -| *`enterprise`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Deprecated: Use `licenseSecretRef` instead. + -`enterprise` is available in Console chart version earlier or equal to v0.7.31 + | *`licenseSecretRef`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#secretkeyselector-v1-core[$$SecretKeySelector$$]__ | Defines a reference to Kubernetes Secret that points to a Redpanda Enterprise license. + Please consider use Enterprise in RedpandaClusterSpec type. + `licenseSecretRef` is available in Console chart version after v0.7.31 semver + | *`automountServiceAccountToken`* __boolean__ | Automount API credentials for the Service Account into the pod. + | *`readinessProbe`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-readinessprobe[$$ReadinessProbe$$]__ | Settings for console's Deployment's readiness probe. + | *`livenessProbe`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-livenessprobe[$$LivenessProbe$$]__ | Settings for console's Deployment's liveness probe. + -| *`tests`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-enablable[$$Enablable$$]__ | Controls the creation of helm tests for console. + +| *`console`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Deprecated: Use `config` instead + +`console` is available in Console chart version earlier or equal to v0.7.31 + +| *`enterprise`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg[$$RawExtension$$]__ | Deprecated: Use `licenseSecretRef` instead. + +`enterprise` is available in Console chart version earlier or equal to v0.7.31 + |=== @@ -2436,28 +2458,20 @@ RedpandaStatus defines the observed state of Redpanda installed license in the Redpanda cluster. + | *`configVersion`* __string__ | ConfigVersion contains the configuration version written in + Redpanda used for restarting broker nodes as necessary. + -| *`observedGeneration`* __integer__ | Specifies the last observed generation. + -deprecated + -| *`lastHandledReconcileAt`* __string__ | LastHandledReconcileAt holds the value of the most recent + -reconcile request value, so a change of the annotation value + -can be detected. + -deprecated + -| *`lastAppliedRevision`* __string__ | LastAppliedRevision is the revision of the last successfully applied source. + -deprecated + -| *`lastAttemptedRevision`* __string__ | LastAttemptedRevision is the revision of the last reconciliation attempt. + -deprecated + -| *`helmRelease`* __string__ | deprecated + -| *`helmReleaseReady`* __boolean__ | deprecated + -| *`helmRepository`* __string__ | deprecated + -| *`helmRepositoryReady`* __boolean__ | deprecated + -| *`upgradeFailures`* __integer__ | deprecated + +| *`observedGeneration`* __integer__ | Deprecated + +| *`lastHandledReconcileAt`* __string__ | Deprecated + +| *`lastAppliedRevision`* __string__ | Deprecated + +| *`lastAttemptedRevision`* __string__ | Deprecated + +| *`helmRelease`* __string__ | Deprecated + +| *`helmReleaseReady`* __boolean__ | Deprecated + +| *`helmRepository`* __string__ | Deprecated + +| *`helmRepositoryReady`* __boolean__ | Deprecated + +| *`upgradeFailures`* __integer__ | Deprecated + | *`failures`* __integer__ | Failures is the reconciliation failure count against the latest desired + state. It is reset after a successful reconciliation. + deprecated + -| *`installFailures`* __integer__ | deprecated + -| *`decommissioningNode`* __integer__ | ManagedDecommissioningNode indicates that a node is currently being + -decommissioned from the cluster and provides its ordinal number. + -deprecated + +| *`installFailures`* __integer__ | Deprecated + +| *`decommissioningNode`* __integer__ | Deprecated + |=== @@ -3095,6 +3109,286 @@ SetTieredStorageCacheDirOwnership configures the settings related to ownership o |=== +[id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlink"] +== ShadowLink + +ShadowLink defines the CRD for ShadowLink cluster configuration. + + + + + +[cols="25a,75a", options="header"] +|=== +| Field | Description +| *`apiVersion`* __string__ | `cluster.redpanda.com/v1alpha2` +| *`kind`* __string__ | `ShadowLink` +| *`kind`* __string__ | Kind is a string value representing the REST resource this object represents. + +Servers may infer this from the endpoint the client submits requests to. + +Cannot be updated. + +In CamelCase. + +More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + +| *`apiVersion`* __string__ | APIVersion defines the versioned schema of this representation of an object. + +Servers should convert recognized schemas to the latest internal value, and + +may reject unrecognized values. + +More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + +| *`metadata`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta[$$ObjectMeta$$]__ | Refer to the Kubernetes API documentation for fields of `metadata`. + +| *`spec`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkspec[$$ShadowLinkSpec$$]__ | +| *`status`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkstatus[$$ShadowLinkStatus$$]__ | +|=== + + +[id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkconsumeroffsetsyncoptions"] +== ShadowLinkConsumerOffsetSyncOptions + +Options for syncing consumer offsets + + + +.Appears in: +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkspec[$$ShadowLinkSpec$$] + +[cols="25a,75a", options="header"] +|=== +| Field | Description +| *`interval`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#duration-v1-meta[$$Duration$$]__ | Sync interval + +If 0 provided, defaults to 30 seconds + +| *`paused`* __boolean__ | Allows user to pause the consumer offset sync task. If paused, then + +the task will enter the 'paused' state and not sync consumer offsets from + +the source cluster + +| *`groupFilters`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-namefilter[$$NameFilter$$] array__ | The filters + +|=== + + +[id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkschemaregistrysyncoptions"] +== ShadowLinkSchemaRegistrySyncOptions + +Options for syncing schema registry settings + + + +.Appears in: +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkspec[$$ShadowLinkSpec$$] + +[cols="25a,75a", options="header"] +|=== +| Field | Description +| *`schema_registry_shadowing_mode`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkschemaregistrysyncoptionsmode[$$ShadowLinkSchemaRegistrySyncOptionsMode$$]__ | +|=== + + +[id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkschemaregistrysyncoptionsmode"] +== ShadowLinkSchemaRegistrySyncOptionsMode (string) + + + + + +.Appears in: +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkschemaregistrysyncoptions[$$ShadowLinkSchemaRegistrySyncOptions$$] + + + +[id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinksecuritysettingssyncoptions"] +== ShadowLinkSecuritySettingsSyncOptions + +Options for syncing security settings + + + +.Appears in: +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkspec[$$ShadowLinkSpec$$] + +[cols="25a,75a", options="header"] +|=== +| Field | Description +| *`interval`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#duration-v1-meta[$$Duration$$]__ | Sync interval + +If 0 provided, defaults to 30 seconds + +| *`paused`* __boolean__ | Allows user to pause the security settings sync task. If paused, + +then the task will enter the 'paused' state and will not sync security + +settings from the source cluster + +| *`aclFilters`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-aclfilter[$$ACLFilter$$] array__ | ACL filters + +|=== + + +[id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkspec"] +== ShadowLinkSpec + + + + + +.Appears in: +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlink[$$ShadowLink$$] + +[cols="25a,75a", options="header"] +|=== +| Field | Description +| *`shadowCluster`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-clustersource[$$ClusterSource$$]__ | +| *`sourceCluster`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-clustersource[$$ClusterSource$$]__ | +| *`topicMetadataSyncOptions`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinktopicmetadatasyncoptions[$$ShadowLinkTopicMetadataSyncOptions$$]__ | Topic metadata sync options + +| *`consumerOffsetSyncOptions`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkconsumeroffsetsyncoptions[$$ShadowLinkConsumerOffsetSyncOptions$$]__ | Consumer offset sync options + +| *`securitySyncOptions`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinksecuritysettingssyncoptions[$$ShadowLinkSecuritySettingsSyncOptions$$]__ | Security settings sync options + +| *`schemaRegistrySyncOptions`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkschemaregistrysyncoptions[$$ShadowLinkSchemaRegistrySyncOptions$$]__ | options for schema registry + +|=== + + +[id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkstate"] +== ShadowLinkState (string) + +State of the shadow link + + + +.Appears in: +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkstatus[$$ShadowLinkStatus$$] + + + +[id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkstatus"] +== ShadowLinkStatus + +ShadowLinkStatus defines the observed state of any node pools tied to this cluster + + + +.Appears in: +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlink[$$ShadowLink$$] + +[cols="25a,75a", options="header"] +|=== +| Field | Description +| *`state`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkstate[$$ShadowLinkState$$]__ | State of the shadow link + +| *`taskStatuses`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinktaskstatus[$$ShadowLinkTaskStatus$$] array__ | Statuses of the running tasks + +| *`shadowTopicStatuses`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowtopicstatus[$$ShadowTopicStatus$$] array__ | Status of shadow topics + +| *`conditions`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta[$$Condition$$] array__ | Conditions holds the conditions for the ShadowLink. + +|=== + + +[id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinktaskstatus"] +== ShadowLinkTaskStatus + + + + + +.Appears in: +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkstatus[$$ShadowLinkStatus$$] + +[cols="25a,75a", options="header"] +|=== +| Field | Description +| *`lastTransitionTime`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#time-v1-meta[$$Time$$]__ | +| *`name`* __string__ | Name of the task + +| *`state`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-taskstate[$$TaskState$$]__ | State of the task + +| *`reason`* __string__ | Reason for task being in state + +| *`brokerId`* __integer__ | The broker the task is running on + +|=== + + +[id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinktopicmetadatasyncoptions"] +== ShadowLinkTopicMetadataSyncOptions + +Options for syncing topic metadata + + + +.Appears in: +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkspec[$$ShadowLinkSpec$$] + +[cols="25a,75a", options="header"] +|=== +| Field | Description +| *`interval`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#duration-v1-meta[$$Duration$$]__ | How often to sync metadata + +If 0 provided, defaults to 30 seconds + +| *`autoCreateShadowTopicFilters`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-namefilter[$$NameFilter$$] array__ | List of filters that indicate which topics should be automatically + +created as shadow topics on the shadow cluster. This only controls + +automatic creation of shadow topics and does not effect the state of the + +mirror topic once it is created. + +Literal filters for __consumer_offsets and _redpanda.audit_log will be + +rejected as well as prefix filters to match topics prefixed with + +_redpanda or __redpanda. + +Wildcard `*` is permitted only for literal filters and will _not_ match + +any topics that start with _redpanda or __redpanda. If users wish to + +shadow topics that start with _redpanda or __redpanda, they should + +provide a literal filter for those topics. + +| *`syncedShadowTopicProperties`* __string array__ | List of topic properties that should be synced from the source topic. + +The following properties will always be replicated + +- Partition count + +- `max.message.bytes` + +- `cleanup.policy` + +- `timestamp.type` + + + +The following properties are not allowed to be replicated and adding them + +to this list will result in an error: + +- `redpanda.remote.readreplica` + +- `redpanda.remote.recovery` + +- `redpanda.remote.allowgaps` + +- `redpanda.virtual.cluster.id` + +- `redpanda.leaders.preference` + +- `redpanda.cloud_topic.enabled` + + + +This list is a list of properties in addition to the default properties + +that will be synced. See `excludeDefault`. + +| *`excludeDefault`* __boolean__ | If false, then the following topic properties will be synced by default: + +- `compression.type` + +- `retention.bytes` + +- `retention.ms` + +- `delete.retention.ms` + +- Replication Factor + +- `min.compaction.lag.ms` + +- `max.compaction.lag.ms` + + + +If this is true, then only the properties listed in + +`synced_shadow_topic_properties` will be synced. + +| *`startOffset`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-topicmetadatasyncoffset[$$TopicMetadataSyncOffset$$]__ | The starting offset for new shadow topic partitions. + +Defaults to earliest. + +Only applies if the shadow partition is empty. + +| *`startOffsetTimestamp`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#time-v1-meta[$$Time$$]__ | The timestamp to start at if `startOffset`` is set to "timestamp". + +Not providing this when setting `startOffset` to "timestamp" is + +an error. + +| *`paused`* __boolean__ | Allows user to pause the topic sync task. If paused, then + +the task will enter the 'paused' state and not sync topics or their + +properties from the source cluster + +|=== + + +[id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowtopicstate"] +== ShadowTopicState (string) + +State of a shadow topic + + + +.Appears in: +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowtopicstatus[$$ShadowTopicStatus$$] + + + +[id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowtopicstatus"] +== ShadowTopicStatus + +Status of a ShadowTopic + + + +.Appears in: +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinkstatus[$$ShadowLinkStatus$$] + +[cols="25a,75a", options="header"] +|=== +| Field | Description +| *`lastTransitionTime`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#time-v1-meta[$$Time$$]__ | +| *`name`* __string__ | Name of the shadow topic + +| *`topicId`* __string__ | Topic ID of the shadow topic + +| *`state`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowtopicstate[$$ShadowTopicState$$]__ | State of the shadow topic + +|=== + + [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-sidecarobj"] == SideCarObj @@ -3229,7 +3523,7 @@ API of a Redpanda cluster where the object should be created. + [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-storage"] == Storage -Storage configures storage-related settings in the Helm values. See https://docs.redpanda.com/current/manage/kubernetes/storage/. +Storage configures storage-related settings in the Helm values. See xref:manage:kubernetes/storage.adoc[] @@ -3248,7 +3542,7 @@ Storage configures storage-related settings in the Helm values. See https://docs [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-tls"] == TLS -TLS configures TLS in the Helm values. See https://docs.redpanda.com/current/manage/kubernetes/security/tls/. +TLS configures TLS in the Helm values. See xref:manage:kubernetes/security/tls.adoc[] @@ -3263,12 +3557,22 @@ TLS configures TLS in the Helm values. See https://docs.redpanda.com/current/man |=== +[id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-taskstate"] +== TaskState (string) + +Task states + + + +.Appears in: +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinktaskstatus[$$ShadowLinkTaskStatus$$] + [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-tiered"] == Tiered -Tiered configures storage for the Tiered Storage cache. See https://docs.redpanda.com/current/manage/kubernetes/tiered-storage-kubernetes/. +Tiered configures storage for the Tiered Storage cache. See xref:manage:kubernetes/tiered-storage-kubernetes.adoc[] @@ -3305,44 +3609,44 @@ TieredConfig configures Tiered Storage, which requires an Enterprise license con [cols="25a,75a", options="header"] |=== | Field | Description -| *`cloud_storage_enabled`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-apiutil-jsonboolean[$$JSONBoolean$$]__ | Enables Tiered Storage, if a license key is provided. See https://docs.redpanda.com/docs/reference/cluster-properties/#cloud_storage_enabled. + -| *`cloud_storage_api_endpoint`* __string__ | See https://docs.redpanda.com/docs/reference/cluster-properties/#cloud_storage_api_endpoint. + -| *`cloud_storage_api_endpoint_port`* __integer__ | See https://docs.redpanda.com/current/reference/cluster-properties/#cloud_storage_api_endpoint_port. + -| *`cloud_storage_bucket`* __string__ | See https://docs.redpanda.com/current/reference/cluster-properties/#cloud_storage_bucket. + -| *`cloud_storage_azure_container`* __string__ | See https://docs.redpanda.com/docs/reference/cluster-properties/#cloud_storage_azure_container. + -| *`cloud_storage_azure_managed_identity_id`* __string__ | See https://docs.redpanda.com/docs/reference/cluster-properties/#cloud_storage_azure_managed_identity_id. + -| *`cloud_storage_azure_storage_account`* __string__ | See https://docs.redpanda.com/docs/reference/cluster-properties/#cloud_storage_azure_storage_account. + -| *`cloud_storage_azure_shared_key`* __string__ | See https://docs.redpanda.com/docs/reference/cluster-properties/#cloud_storage_azure_shared_key. + -| *`cloud_storage_azure_adls_endpoint`* __string__ | See https://docs.redpanda.com/docs/reference/cluster-properties/#cloud_storage_azure_adls_endpoint. + -| *`cloud_storage_azure_adls_port`* __integer__ | See https://docs.redpanda.com/docs/reference/cluster-properties/#cloud_storage_azure_adls_port. + -| *`cloud_storage_cache_check_interval`* __integer__ | See https://docs.redpanda.com/current/reference/tunable-properties/#cloud_storage_cache_check_interval. + -| *`cloud_storage_cache_directory`* __string__ | See https://docs.redpanda.com/current/reference/node-properties/#cloud_storage_cache_directory. + -| *`cloud_storage_cache_size`* __string__ | See https://docs.redpanda.com/current/reference/cluster-properties/#cloud_storage_cache_size. + -| *`cloud_storage_credentials_source`* __string__ | See https://docs.redpanda.com/current/reference/cluster-properties/#cloud_storage_credentials_source. + -| *`cloud_storage_disable_tls`* __boolean__ | See https://docs.redpanda.com/current/reference/cluster-properties/#cloud_storage_disable_tls. + -| *`cloud_storage_enable_remote_read`* __boolean__ | See https://docs.redpanda.com/current/reference/tunable-properties/#cloud_storage_enable_remote_read. + -| *`cloud_storage_enable_remote_write`* __boolean__ | See https://docs.redpanda.com/current/reference/tunable-properties/#cloud_storage_enable_remote_write. + -| *`cloud_storage_initial_backoff_ms`* __integer__ | See https://docs.redpanda.com/current/reference/tunable-properties/#cloud_storage_initial_backoff_ms. + -| *`cloud_storage_manifest_upload_timeout_ms`* __integer__ | See https://docs.redpanda.com/current/reference/tunable-properties/#cloud_storage_manifest_upload_timeout_ms. + -| *`cloud_storage_max_connection_idle_time_ms`* __integer__ | See https://docs.redpanda.com/current/reference/tunable-properties/#cloud_storage_max_connection_idle_time_ms. + -| *`cloud_storage_max_connections`* __integer__ | See https://docs.redpanda.com/current/reference/cluster-properties/#cloud_storage_max_connections. + -| *`cloud_storage_reconciliation_interval_ms`* __integer__ | Deprecated: See https://docs.redpanda.com/current/reference/tunable-properties/#cloud_storage_reconciliation_interval_ms. + -| *`cloud_storage_region`* __string__ | See https://docs.redpanda.com/current/reference/cluster-properties/#cloud_storage_region. + -| *`cloud_storage_segment_max_upload_interval_sec`* __integer__ | See https://docs.redpanda.com/current/reference/tunable-properties/#cloud_storage_segment_max_upload_interval_sec. + -| *`cloud_storage_segment_upload_timeout_ms`* __integer__ | See https://docs.redpanda.com/current/reference/tunable-properties/#cloud_storage_segment_upload_timeout_ms. + -| *`cloud_storage_trust_file`* __string__ | See https://docs.redpanda.com/current/reference/cluster-properties/#cloud_storage_trust_file. + -| *`cloud_storage_upload_ctrl_d_coeff`* __integer__ | See https://docs.redpanda.com/current/reference/tunable-properties/#cloud_storage_upload_ctrl_d_coeff. + -| *`cloud_storage_upload_ctrl_max_shares`* __integer__ | See https://docs.redpanda.com/current/reference/tunable-properties/#cloud_storage_upload_ctrl_max_shares. + -| *`cloud_storage_upload_ctrl_min_shares`* __integer__ | See https://docs.redpanda.com/current/reference/tunable-properties/#cloud_storage_upload_ctrl_min_shares. + -| *`cloud_storage_upload_ctrl_p_coeff`* __integer__ | See https://docs.redpanda.com/current/reference/tunable-properties/#cloud_storage_upload_ctrl_p_coeff. + -| *`cloud_storage_upload_ctrl_update_interval_ms`* __integer__ | See https://docs.redpanda.com/current/reference/tunable-properties/#cloud_storage_upload_ctrl_update_interval_ms. + +| *`cloud_storage_enabled`* __xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-apiutil-jsonboolean[$$JSONBoolean$$]__ | Enables Tiered Storage, if a license key is provided. See xref:reference:cluster-properties.adoc[] + +| *`cloud_storage_api_endpoint`* __string__ | See xref:reference:cluster-properties.adoc[] + +| *`cloud_storage_api_endpoint_port`* __integer__ | See xref:reference:cluster-properties.adoc[] + +| *`cloud_storage_bucket`* __string__ | See xref:reference:cluster-properties.adoc[] + +| *`cloud_storage_azure_container`* __string__ | See xref:reference:cluster-properties.adoc[] + +| *`cloud_storage_azure_managed_identity_id`* __string__ | See xref:reference:cluster-properties.adoc[] + +| *`cloud_storage_azure_storage_account`* __string__ | See xref:reference:cluster-properties.adoc[] + +| *`cloud_storage_azure_shared_key`* __string__ | See xref:reference:cluster-properties.adoc[] + +| *`cloud_storage_azure_adls_endpoint`* __string__ | See xref:reference:cluster-properties.adoc[] + +| *`cloud_storage_azure_adls_port`* __integer__ | See xref:reference:cluster-properties.adoc[] + +| *`cloud_storage_cache_check_interval`* __integer__ | See xref:reference:tunable-properties.adoc[] + +| *`cloud_storage_cache_directory`* __string__ | See xref:reference:node-properties.adoc[] + +| *`cloud_storage_cache_size`* __string__ | See xref:reference:cluster-properties.adoc[] + +| *`cloud_storage_credentials_source`* __string__ | See xref:reference:cluster-properties.adoc[] + +| *`cloud_storage_disable_tls`* __boolean__ | See xref:reference:cluster-properties.adoc[] + +| *`cloud_storage_enable_remote_read`* __boolean__ | See xref:reference:tunable-properties.adoc[] + +| *`cloud_storage_enable_remote_write`* __boolean__ | See xref:reference:tunable-properties.adoc[] + +| *`cloud_storage_initial_backoff_ms`* __integer__ | See xref:reference:tunable-properties.adoc[] + +| *`cloud_storage_manifest_upload_timeout_ms`* __integer__ | See xref:reference:tunable-properties.adoc[] + +| *`cloud_storage_max_connection_idle_time_ms`* __integer__ | See xref:reference:tunable-properties.adoc[] + +| *`cloud_storage_max_connections`* __integer__ | See xref:reference:cluster-properties.adoc[] + +| *`cloud_storage_reconciliation_interval_ms`* __integer__ | Deprecated: See xref:reference:tunable-properties.adoc[] + +| *`cloud_storage_region`* __string__ | See xref:reference:cluster-properties.adoc[] + +| *`cloud_storage_segment_max_upload_interval_sec`* __integer__ | See xref:reference:tunable-properties.adoc[] + +| *`cloud_storage_segment_upload_timeout_ms`* __integer__ | See xref:reference:tunable-properties.adoc[] + +| *`cloud_storage_trust_file`* __string__ | See xref:reference:cluster-properties.adoc[] + +| *`cloud_storage_upload_ctrl_d_coeff`* __integer__ | See xref:reference:tunable-properties.adoc[] + +| *`cloud_storage_upload_ctrl_max_shares`* __integer__ | See xref:reference:tunable-properties.adoc[] + +| *`cloud_storage_upload_ctrl_min_shares`* __integer__ | See xref:reference:tunable-properties.adoc[] + +| *`cloud_storage_upload_ctrl_p_coeff`* __integer__ | See xref:reference:tunable-properties.adoc[] + +| *`cloud_storage_upload_ctrl_update_interval_ms`* __integer__ | See xref:reference:tunable-properties.adoc[] + |=== [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-topic"] == Topic -Topic defines the CRD for Topic resources. See https://docs.redpanda.com/current/manage/kubernetes/manage-topics/. +Topic defines the CRD for Topic resources. See xref:manage:kubernetes/manage-topics.adoc[] @@ -3369,12 +3673,22 @@ More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api- |=== +[id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-topicmetadatasyncoffset"] +== TopicMetadataSyncOffset (string) + + + + + +.Appears in: +- xref:{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-shadowlinktopicmetadatasyncoptions[$$ShadowLinkTopicMetadataSyncOptions$$] + [id="{anchor_prefix}-github-com-redpanda-data-redpanda-operator-operator-api-redpanda-v1alpha2-topicspec"] == TopicSpec -TopicSpec defines the desired state of the topic. See https://docs.redpanda.com/current/manage/kubernetes/manage-topics/. +TopicSpec defines the desired state of the topic. See xref:manage:kubernetes/manage-topics.adoc[] @@ -3390,11 +3704,11 @@ It can be increased after topic creation, but it is + important to understand the consequences that has, especially for + topics with semantic partitioning. When absent this will default to + the Redpanda cluster configuration `default_topic_partitions`. + -See https://docs.redpanda.com/docs/reference/cluster-properties/#default_topic_partitions and + -https://docs.redpanda.com/docs/get-started/architecture/#partitions + +See xref:reference:cluster-properties.adoc[] and + +xref:get-started:architecture.adoc[] + | *`replicationFactor`* __integer__ | Specifies the number of replicas the topic should have. Must be odd value. + When absent this will default to the Redpanda cluster configuration `default_topic_replications`. + -See https://docs.redpanda.com/docs/reference/cluster-properties/#default_topic_replications. + +See xref:reference:cluster-properties.adoc[] + | *`overwriteTopicName`* __string__ | Changes the topic name from the value of `metadata.name`. + | *`additionalConfig`* __object (keys:string, values:string)__ | Adds extra topic configurations. This is a free-form map of any configuration options that topics can have. + Examples: + diff --git a/modules/troubleshoot/partials/errors-and-solutions.adoc b/modules/troubleshoot/partials/errors-and-solutions.adoc index cce84a11e6..1f592f86d4 100644 --- a/modules/troubleshoot/partials/errors-and-solutions.adoc +++ b/modules/troubleshoot/partials/errors-and-solutions.adoc @@ -576,3 +576,179 @@ If you're using `rpk`, ensure to specify the `-X user`, `-X pass`, and `-X sasl. For all available flags, see the xref:reference:rpk/rpk-x-options.adoc[`rpk` options reference]. //end::sasl[] + +//tag::shadow-linking[] +=== pattern_type is unspecified + +When creating a shadow link with `rpk shadow create`, you may see: + +[.no-copy] +---- +Invalid cluster link configuration: pattern_type is unspecified +---- + +Ensure pattern_type values are uppercase: `LITERAL`, `PREFIX`. + +=== broker_not_available with TLS enabled + +When creating a shadow link with TLS enabled, you may see: + +[.no-copy] +---- +Cluster link unreachable, preflight check failed - { node: -1 }, { error_code: broker_not_available [8] } +---- + +The shadow cluster cannot verify the source cluster's TLS certificate. This is the most common issue when using TLS with self-signed certificates (the default for Kubernetes deployments with `tls.certs.default.caEnabled=true`). + +Ensure that the shadow link configuration includes the source cluster's CA certificate. + +=== Wrong SSL version number + +When creating a shadow link, you may see in the source cluster logs: + +[.no-copy] +---- +Disconnected (applying protocol, Wrong SSL Version number: ensure client is configured to use TLS) +---- + +The source cluster requires TLS but your shadow link configuration is missing TLS settings or has `tls_settings.enabled: false`. + + +=== broker_not_available without TLS + +When creating a shadow link without TLS, you may see: + +[.no-copy] +---- +Cluster link unreachable, preflight check failed - { node: -1 }, { error_code: broker_not_available [8] } +---- + +Verify that `bootstrap_servers` addresses are reachable from the shadow cluster and that ports are correct. + +ifdef::env-kubernetes[] +Test connectivity from the shadow pod: + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + curl -v telnet://: +---- +endif::[] + +=== Connection timeout + +When creating a shadow link, the command may hang or timeout without completing. + +Check network connectivity between shadow and source clusters. Verify firewall rules and network policies allow traffic between the namespaces. +//end::shadow-linking[] + +//tag::shadow-link-monitoring[] +=== Topics in FAULTED state + +When monitoring shadow links, you may see topics showing `FAULTED` state in status output. + +ifdef::env-kubernetes[] +Check shadow cluster logs for specific error messages: + +[,bash] +---- +kubectl logs --namespace --container redpanda | grep -i "shadow\|error" +---- +endif::[] + +Common causes include: + +* Source topic deleted: topic no longer exists on source cluster +* Permission denied: shadow link service account lacks required permissions +* Network interruption: temporary connectivity issues + +If the source topic still exists and should be replicated, delete and recreate the shadow link to reset the faulted state. + +=== High replication lag + +When monitoring shadow links, you may see LAG values continuously increasing in `rpk shadow status`. + +Check the following: + +* Check source cluster load: high produce rate may exceed replication capacity +* Check shadow cluster resources: CPU, memory, or disk constraints +* Check network bandwidth: verify sufficient bandwidth between clusters + +To resolve: + +* Scale shadow cluster resources if constrained +* Verify network connectivity and bandwidth +* Review topic configuration for optimization opportunities + +=== Task shows LINK_UNAVAILABLE + +When monitoring shadow links, you may see tasks showing `LINK_UNAVAILABLE` state with "No brokers available" message. + +Common causes include: + +* Source cluster requires SASL authentication but shadow link not configured for it +* Source cluster unreachable from shadow cluster +* Network policy blocking traffic between clusters + +To resolve: + +* Verify SASL configuration if source cluster requires authentication +* Test network connectivity: `kubectl exec` into shadow pod and try connecting to source cluster +* Check Kubernetes NetworkPolicies and firewall rules +//end::shadow-link-monitoring[] + + +ifdef::env-kubernetes[] +=== ShadowLink resource stuck + +When using the Operator, the `ShadowLink` resource may not delete or show errors. + +Check the Redpanda Operator logs: + +[,bash] +---- +kubectl logs --namespace -l app.kubernetes.io/name=operator --tail=100 +---- + +Check the Operator logs for specific errors preventing cleanup. Contact Redpanda support if the resource remains stuck after addressing any logged errors. +endif::[] + +//tag::shadow-link-failover[] +=== Application connection failures after failover + +Applications may not be able to connect to the shadow cluster after failover. + +ifdef::env-kubernetes[] +Verify shadow cluster Kubernetes Service endpoints: + +[,bash] +---- +kubectl get service --namespace +---- + +Check NetworkPolicy if using network policies: + +[,bash] +---- +kubectl get networkpolicy --namespace +---- +endif::[] + +Confirm authentication credentials are valid for the shadow cluster and test network connectivity from application hosts. + +=== Consumer group offset issues after failover + +After failover, consumers may start from the beginning or wrong positions. + +ifdef::env-kubernetes[] +Verify consumer group offsets were replicated (check your shadow link filters): + +[,bash] +---- +kubectl exec --namespace --container redpanda -- \ + rpk group describe +---- +endif::[] + +If necessary, manually reset offsets to appropriate positions. See link:https://support.redpanda.com/hc/en-us/articles/23499121317399-How-to-manage-consumer-group-offsets-in-Redpanda[How to manage consumer group offsets in Redpanda^] for detailed reset procedures. +//end::shadow-link-failover[] diff --git a/modules/upgrade/pages/k-compatibility.adoc b/modules/upgrade/pages/k-compatibility.adoc index 54d64bf6d9..b5c000c31c 100644 --- a/modules/upgrade/pages/k-compatibility.adoc +++ b/modules/upgrade/pages/k-compatibility.adoc @@ -37,7 +37,27 @@ Redpanda Core has no direct dependency on Kubernetes. Compatibility is influence |=== |Redpanda Core / `rpk` |Helm Chart |Operator Helm Chart |Operator |Helm CLI |Kubernetes -.2+|25.2.x +.2+|25.3.x + +|25.3.x +|25.3.x +|25.3.x +|3.12+ +d|1.30.x - 1.33.x{fn-k8s-compatibility} + +|25.2.x +|25.2.x +|25.2.x +|3.12+ +d|1.30.x - 1.33.x{fn-k8s-compatibility} + +.3+|25.2.x + +|25.3.x +|25.3.x +|25.3.x +|3.12+ +d|1.30.x - 1.33.x{fn-k8s-compatibility} |25.2.x |25.2.x @@ -72,44 +92,6 @@ d|1.28.x - 1.32.x{fn-k8s-compatibility} |2.3.x |3.12+ d|1.28.x - 1.32.x{fn-k8s-compatibility} - -.4+|24.3.x -|25.1.x -|25.1.x -|25.1.x -|3.11+ -d|1.28.x - 1.32.x{fn-k8s-compatibility} - -|5.9.x -|2.4.x -|2.4.x -|3.11+ -d|1.28.x - 1.31.x{fn-k8s-compatibility} - -|5.9.x -|0.4.36 -|2.3.x -|3.11+ -d|1.28.x - 1.31.x{fn-k8s-compatibility} - -|5.9.x -|0.4.29 -|2.2.x -|3.11+ -d|1.28.x - 1.31.x{fn-k8s-compatibility} - -.2+|24.2.x -|5.9.x -|0.4.29 -|2.2.x -|3.10+ -d|1.27.x - 1.30.x{fn-k8s-compatibility} - -|5.8.x -|0.4.29 -|2.2.x -|3.10+ -d|1.27.x - 1.30.x{fn-k8s-compatibility} |=== By default, the Redpanda Helm chart depends on cert-manager for enabling TLS. @@ -129,12 +111,12 @@ Upgrading the Helm chart may also upgrade Redpanda Console. Because of this buil |Redpanda Console |Helm Chart |Operator |v3.x.x -|25.2.x, 25.1.x -|Not yet supported +|25.3.x, 25.2.x, 25.1.x +|25.3.x, 25.2.x |v2.x.x |5.10.1, 5.9.x, 5.8.x -|25.2.x, 25.1.x, 2.4.x, 2.3.x, 2.2.x +|25.3.x, 25.2.x, 25.1.x, 2.4.x, 2.3.x, 2.2.x |===