Skip to content

docs: add version upgrade best practices guide #1032

@kwannoel

Description

@kwannoel

Context

Slack thread requested adding docs for "best practices for version upgrades to RisingWave":

  • Thread reference from transcript: http://Can%20you%20add%20best%20practices%20for%20version%20upgrades%20to%20Risingwave?
  • Follow-up in thread confirms action request: "Please add the docs as mentioned above"

Related docs context:

  • Existing best-practices page: performance/best-practices.mdx
  • Existing performance best-practices page: performance/performance-best-practices.mdx
  • Legacy/related upgrade guidance: upgrade-risingwave-k8s.md (runllm source)

Problem

Current docs do not clearly provide a single, practical "version upgrade best practices" guide that users can follow end-to-end.

Users need explicit guidance for:

  • Pre-upgrade checks (backups/snapshots, release-note review, compatibility constraints)
  • Supported upgrade paths and downgrade caveats
  • Rollout strategy (staged/incremental upgrade and validation gates)
  • Rollback/mitigation paths by deployment type (self-managed Kubernetes vs RisingWave Cloud)
  • Post-upgrade validation (cluster/pod health, query/job verification, alert monitoring)

Without a consolidated guide, upgrades are error-prone and operators may miss critical safety steps.

Suggested Fix

Add a dedicated docs section for Version Upgrade Best Practices (either as a new page under deployment/operations or as a prominent subsection in existing best-practices docs), with actionable checklists and deployment-specific notes.

Proposed content outline:

  1. Before upgrade
  • Read release notes and identify breaking changes/deprecations.
  • Verify supported source -> target version path.
  • Create backups/snapshots and capture current config.
  • Define maintenance window and success/rollback criteria.
  1. Compatibility and safety constraints
  • Document that upgrading forward is supported; downgrades are generally not supported and may risk incompatibility.
  • Clarify deployment-specific rollback availability (self-managed Kubernetes possible with prior image/config; RisingWave Cloud constraints).
  1. Execution strategy
  • Prefer staged/canary upgrade where possible.
  • Upgrade incrementally and validate between steps.
  • Include recommended observability checks during rollout.
  1. Rollback and incident response
  • Self-managed Kubernetes rollback playbook (revert image/manifest, verify pod readiness, restore if needed).
  • Cloud path: escalation/support guidance when rollback is unavailable.
  1. Post-upgrade validation checklist
  • Control plane/data plane health checks.
  • Smoke tests for ingestion, materialized views, and query latency.
  • Monitor alerts and error logs for a defined stabilization period.

Implementation notes:

  • Add cross-links from existing performance best-practices pages.
  • Include references to release notes and upgrade docs in navigation.
  • Add a concise operator checklist for copy/paste runbooks.

Metadata

Metadata

Labels

documentationImprovements or additions to documentationenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions