Skip to content

Conversation

@zzzeek
Copy link
Contributor

@zzzeek zzzeek commented Jul 1, 2025

In order to facilitate an in-place change to the name of the Secret that is referenced by a Galera instance for the mysql root password, rework
the approach used by pods and shell scripts to no longer require the root secret name and/or password be passed by environment variable, instead using a pod-level cluster query to retrieve the current root password. The logic to retrieve this password is encapsulated into a single shell script that is present as a volume mount on running containers.

This allows Job objects to be created with hashes that do not link to a specific Secret name, as well as to create StatefulSet objects that don't refer to this name. When the Secret name changes on a Galera instance for an in-place root password change, the hashes / CRs for these objects will remain unchanged.

A subsequent change to the mariadb operator will add the ability to change the mysql root password of a Galera cluster using a dual-reference architecture where
the "current" root secret will be part of /Status, while the secret referenced in /Spec will be the "new" root secret. When these two names differ, that will indicate an in-place password change should take place, as well as allowing the pre-existing root password to be available at the same time as the new one in order to do a root password
change. The same
architecture will be applied to a new class of "system" MariaDBAccount
objects that are for use only by the Galera instance itself
and do not have a link to any MariaDBDatabase CR. The
Galera CR itself will no longer use osp-secret
for the mysql root password nor will the secret be directly
referenced from the Galera CR, instead referenced by a
"system" MariaDBAccount CR which the Galera operator itself
will create.

@openshift-ci openshift-ci bot requested review from lewisdenny and viroel July 1, 2025 13:01
@zzzeek zzzeek requested a review from dciabrin July 1, 2025 13:01
@zzzeek zzzeek changed the title Get MYSQL_PWD using an on-demand cluster query Get MYSQL_PWD using an on-demand cluster query (PR 1 of 6) Jul 1, 2025
@zzzeek zzzeek force-pushed the OSPRH-14916-pr1 branch 2 times, most recently from f429f65 to a91d023 Compare July 1, 2025 21:27
@zzzeek
Copy link
Contributor Author

zzzeek commented Jul 3, 2025

/retest

@zzzeek
Copy link
Contributor Author

zzzeek commented Jul 3, 2025

/recheck

@zzzeek
Copy link
Contributor Author

zzzeek commented Jul 3, 2025

/test

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 3, 2025

@zzzeek: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

/test functional
/test images
/test mariadb-operator-build-deploy-chainsaw
/test mariadb-operator-build-deploy-kuttl
/test precommit-check

The following commands are available to trigger optional jobs:

/test mariadb-operator-build-deploy

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openstack-k8s-operators-mariadb-operator-main-functional
pull-ci-openstack-k8s-operators-mariadb-operator-main-images
pull-ci-openstack-k8s-operators-mariadb-operator-main-mariadb-operator-build-deploy-chainsaw
pull-ci-openstack-k8s-operators-mariadb-operator-main-mariadb-operator-build-deploy-kuttl
pull-ci-openstack-k8s-operators-mariadb-operator-main-precommit-check

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@zzzeek
Copy link
Contributor Author

zzzeek commented Jul 3, 2025

/test mariadb-operator-build-deploy-kuttl

@zzzeek zzzeek force-pushed the OSPRH-14916-pr1 branch from a91d023 to 638bc6a Compare July 4, 2025 01:00
@zzzeek
Copy link
Contributor Author

zzzeek commented Jul 4, 2025

that last run failed in the openstack deploy step. everything succeeded except neutron that got stuck in db upgrade, it failed with: "sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (1054, "Unknown column 'quotas.project_id' in 'field list'")"

not really sure how that can happen

@zzzeek
Copy link
Contributor Author

zzzeek commented Jul 4, 2025

/retest

Copy link
Contributor

@dciabrin dciabrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that.
The only problem I have with this review is that we currently use the db root user for probes, because we haven't got yet a clean way of creating alternative users (part of it will be resolved with your next review I think). So if we were to merge it as is, we'd end up doing two API calls for each probe sent to the galera pods, which is way too many.

I think we should discuss how to best cache that and only call the API when needed. I was thinking something along the lines of:

  1. we keep using root_auth any time we want to get root creds.
  2. internally, root_auth checks whether /root/.my.cnf exists. If it doesn't, it calls the API and generate that file with the current password.
  3. if it exists, it checks whether creds are still valid (e.g. by doing a mysqladmin ping). If not, do 2.
  4. if mysql is not running (i.e. when pod start), do not try to check creds validity, assume .my.cnf is valid and wait for the next probe to update it if needed.


GALERA_INSTANCE="{{.galeraInstanceName}}"

# note jq is not installed in the galera image, macgyvering w/ python instead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, I'll track the shortcoming in a separate Jira so we can fix that for good.

@zzzeek zzzeek force-pushed the OSPRH-14916-pr1 branch 4 times, most recently from 2467a94 to 4c0c5f5 Compare October 11, 2025 21:08
@zzzeek zzzeek force-pushed the OSPRH-14916-pr1 branch 3 times, most recently from a151658 to f87d0bd Compare October 12, 2025 19:47
@dciabrin
Copy link
Contributor

dciabrin commented Nov 3, 2025

/lgtm

@lmiccini
Copy link

lmiccini commented Nov 3, 2025

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 3, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lmiccini, zzzeek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Nov 3, 2025
@openshift-ci openshift-ci bot removed the lgtm label Nov 3, 2025
zzzeek and others added 2 commits November 3, 2025 14:24
In 1bb2318 the galera certs were regenerated
with a three year expiry, but this did not include the CA
expiration time, leading to failures again.   this change
updates that time as well and adds a script that can be used
to regen the values.
In order to facilitate an in-place change to the name of the
Secret that is referenced by a Galera instance for the
mysql root password, rework
the approach used by pods and shell scripts to no longer
require the root secret name and/or password be passed by
environment variable, instead using a pod-level cluster
query to retrieve the current root password.  The logic
to retrieve this password is encapsulated into a single
shell script that is present as a volume mount on running containers.

This allows Job objects to be created with hashes that
do not link to a specific Secret name, as well as to
create StatefulSet objects that don't refer to this name.
When the Secret name changes on a Galera instance for
an in-place root password change, the hashes / CRs for
these objects will remain unchanged.

A subsequent change to the mariadb operator will add the ability
to change the mysql root password of a Galera cluster using a
dual-reference architecture where
the "current" root secret will be part of <CR>/Status, while
the secret referenced in <CR>/Spec will be the "new" root
secret.  When these two names differ, that will indicate an
in-place password change should take place, as well
as allowing the pre-existing root password to be available
at the same time as the new one in order to do a root password
change.   The same
architecture will be applied to a new class of "system" MariaDBAccount
objects that are for use only by the Galera instance itself
and do not have a link to any MariaDBDatabase CR.  The
Galera CR itself will no longer use osp-secret
for the mysql root password nor will the secret be directly
referenced from the Galera CR, instead referenced by a
"system" MariaDBAccount CR which the Galera operator itself
will create.

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@dciabrin
Copy link
Contributor

dciabrin commented Nov 3, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Nov 3, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 966e316 into openstack-k8s-operators:main Nov 3, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants