Skip to content

Improve 2nd day operations for Storage #91

@gehoern

Description

@gehoern

Summary

Ceph is a complex solution. Currently the teamup of ceph and rook is optimized for getting the system setup. It still misses a lot of second day operations. To make sure ceph and rook work smooth we need standard operation not to break the setup and to be minimized.

As a first step, Ceph cluster upgrades and rolling node upgrades should be improved. Other operations such as growing and shrinking the cluster, monitoring, etc. can follow later in separate issues.


Scope

In Scope

  • Operations in scope
    • Ceph cluster upgrade
    • Node upgrade
  • For the operations in scope
    • Document process
    • Make sure operations can be triggered using GitOps (bumping version in config)
    • Integrate operations into Gardener (mainly node upgrade done by machine controller manager should work together with rook)

Out of Scope

  • PG optimization on grow and shrink
  • token handling (not endless growing database)
  • issue monitoring (e.g. latency ...)
    (These topics will be handled in follow up issues)

Responsible Areas

  • Storage

Contributors


Acceptance Criteria

  • Storage

    • Operations procedures are documented
    • Operations have been performed in a test environment and verified to work

Action Items

  • Assign labels (e.g., area/ironcore-api, kind/design)
  • Set milestone (e.g., H1/2025)
  • Assign dependent sub-issues in each required area
  • Assign an owner to the issue using the GitHub "Assignee" field
  • List all contributors in the "Contributors" section above
  • Add this issue to the Roadmap project board

Metadata

Metadata

Assignees

Labels

area/storageStorage solutions and related concerns.

Projects

Status

Todo

Status

Todo

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions