Skip to content

Node Role - Forbid Scylla upgrade when not explicitly requested#140

Open
fee-mendes wants to merge 2 commits intoscylladb:masterfrom
fee-mendes:refuse-upgrade
Open

Node Role - Forbid Scylla upgrade when not explicitly requested#140
fee-mendes wants to merge 2 commits intoscylladb:masterfrom
fee-mendes:refuse-upgrade

Conversation

@fee-mendes
Copy link
Member

This series introduces an early check to determine whether Scylla is already installed on the target system.
It provides a mechanism to prevent incorrectly (and dangerously) upgrading a system which may be already part of a running cluster.

In such scenarios, we now require upgrade_version: True, which delegates ALL upgrade logic to its relevant upgrade section in the role.

To prevent a "chicken and egg" situation when we fail in the middle of a role execution AND after Scylla has been installed, we introduce the skip_upgrade_check option as a failsafe mechanism, which - obviously - defaults to false.

Tests ran:

  1. Fail role execution after an install of Scylla in a single node cluster (this is - in fact, what led to TLS - Private key (insecure) copy fails when ansible is ran as a regular user #139 ;-)
  2. Try to execute role again, we see a failure complaining that Scylla is already installed.
  3. Add the skip_upgrade_check: True option, which allows us to get through and proceed with the installation/configuration process
  4. Revert the skip_upgrade_check: True option, we see a failure complaining that Scylla is already installed.
  5. Bump Scylla minor version and add the upgrade_version: True && upgrade_major: False options, upgrade completes.

Fixes #132

This commit introduces an early check to determine whether Scylla is already installed on the target system.
It provides a mechanism to prevent incorrectly (and dangerously) upgrading a system which may be already part of a running cluster.

In such scenarios, we now require `upgrade_version: True`, which delegates ALL upgrade logic to its relevant upgrade section in the role.
A funny situation arose when we introduced the option to fail early when Scylla is already installed and upgrade_version == False:

It may happen that the role failed during its runtime, therefore leaving the target systems in an inconsistent state. When this happens, if the failure happened AFTER the Scylla package was installed, then the user would be left in an chicken-and-egg situation:
They can not re-execute the playbook installing the specified version, as the upgrade will abort since we are upgrading to the same release. Similarly, they can not install Scylla again, as it will complain that the package is already installed.

We circumvent that situation by introducing the `skip_upgrade_check` option, which is a failsafe mechanism to allow a role failure to proceed on installation even when `upgrade_version` is False. Needless to say, this option is False by default.
@fee-mendes
Copy link
Member Author

For some reason I can't add a reviewer -- so @vladzcloudius :-)

@vladzcloudius vladzcloudius self-requested a review August 25, 2022 23:22
Copy link
Collaborator

@vladzcloudius vladzcloudius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't merge it because we use a Role's state is not only scylla packages but rather a lot more: swap configuration, systemd configuration, additional packages and lot-lot more.

And sometimes we run a Role on a node/nodes with already installed Scylla in order to apply some changes to that state, e.g. to change io_properties.yml values.

Therefore the new logic this PR suggests is no-go.

What DO need to prevent unintentional upgrades and this IS the case today AFAIK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Scylla Node Role]: Refuse to upgrade unless "upgrade_version" is set to true

2 participants