Skip to content

wal: validate recovery directory through a stable identifier #4416

@jbowens

Description

@jbowens

When recovering a database previously configured with WAL failover, the OPTIONS file encodes the directory that was previously used as a secondary (which may contain relevant WAL files that need to be replayed). Open will error out if this directory encoded within the OPTIONS file is not provided as a recovery directory or the current WAL failover secondary.

However it's also possible for an operator to accidentally remount the wrong disks in the wrong places, so although the correct directory path is configured, its contents are incorrect. We should persist a stable identifier (a UUID?) to both the OPTIONS file and a file within the secondary directory. If recovery finds that the secondary directory does not contain a matching identifier, we can abort recovery indicating that the secondary seems incorrect / corrupt.

This would've helped with a recent DRT test cluster issue where the loss of a VM's host forced a migration of a VM, and the node came back up with incorrect disk mountpoints.

Jira issue: PEBBLE-358

Epic PEBBLE-1158

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions