Skip to content

Develop a robust data backup system for Programs & Events Dashboard #6579

@ragesoss

Description

@ragesoss

Programs & Events Dashboard recently experienced a major database crash: https://phabricator.wikimedia.org/T411890

In this case, the database VM's filesystem broke, and also the attached storage volume that holds the database files had some corruption.

Since the rollout of the timeslice system, we're no longer worried about running out of storage on the Programs & Events Dashboard server, and after cleaning up the duplicated files used during the recovery process, we'd have plenty of extra storage such that we could hold a series of database dumps.

I think ideally, we'd want to do dumps while there are no course updates running, so I think we'd to build a system around something like:

  • On a weekly basis, pause the sidekiq course update and other non-default workers, allow them to finish their current jobs, then do a database dump, and turn them back on.
  • Automatically curate a set of database dumps (eg, keep one that is about 6 months old, one that is about a month old, and one that is at most a week old) to keep, and delete old ones as new ones get created.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions