Add system user password rotation using MySQL dual password#888
Add system user password rotation using MySQL dual password#888shunki-fujita wants to merge 2 commits intomainfrom
Conversation
f9757e2 to
d05bc9d
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds a two-phase, crash-safe password rotation workflow for MOCO’s 8 fixed “system users” by leveraging MySQL’s dual-password feature (RETAIN CURRENT PASSWORD / DISCARD OLD PASSWORD), triggered via kubectl-moco subcommands (or raw annotations) and persisted via status to safely resume after controller restarts.
Changes:
- Implement pending-password lifecycle in Secrets (
*_PENDING+ROTATION_ID) and rotation state inMySQLCluster.status.systemUserRotation. - Add controller reconcile logic for rotate/discard phases, including Secret distribution, status guards, and best-effort annotation cleanup.
- Update
kubectl-moco credentialinto subcommands (show|rotate|discard) and adjust e2e/docs/CRDs accordingly.
Reviewed changes
Copilot reviewed 23 out of 24 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/password/rotation.go | Implements pending-password Secret helpers and validation for rotation workflows. |
| pkg/password/rotation_test.go | Unit tests for pending-password Secret helper behaviors. |
| pkg/dbop/password.go | Adds MySQL ALTER USER ... RETAIN/DISCARD operations to the DB operator. |
| pkg/dbop/operator.go | Extends dbop.Operator interface with rotate/discard password methods. |
| pkg/dbop/nop.go | Implements new operator methods for NopOperator. |
| pkg/constants/meta.go | Adds rotation/discard annotation keys. |
| go.mod | Promotes github.com/google/uuid to a direct dependency. |
| controllers/password_rotation.go | Implements controller rotation/discard phase state machine and Secret distribution. |
| controllers/mysqlcluster_controller.go | Wires password-rotation reconcile into main reconcile flow and adds OpFactory dependency. |
| controllers/mysqlcluster_controller_test.go | Adds integration-style tests for rotate→discard flow and stale-annotation handling. |
| controllers/mock_test.go | Adds mock OperatorFactory/Operator for password rotation tests. |
| cmd/moco-controller/cmd/run.go | Injects OpFactory into the reconciler in the controller binary. |
| cmd/kubectl-moco/cmd/credential.go | Restructures credential into subcommands; adds rotate/discard annotation patching. |
| e2e/replication_test.go | Updates kubectl moco credential invocations to credential show. |
| e2e/lifecycle_test.go | Updates kubectl moco credential invocations to credential show. |
| docs/kubectl-moco.md | Documents new kubectl moco credential <subcommand> structure and rotation commands. |
| docs/designdoc/password_rotation.md | Adds detailed design doc for rotation state machine, safety, and recovery. |
| api/v1beta2/mysqlcluster_types.go | Adds SystemUserRotationStatus and RotationPhase types to the API. |
| api/v1beta2/zz_generated.deepcopy.go | Adds deepcopy support for the new status types. |
| docs/crd_mysqlcluster_v1beta2.md | Documents the new status.systemUserRotation fields. |
| config/crd/bases/moco.cybozu.com_mysqlclusters.yaml | CRD schema update for status.systemUserRotation. |
| config/crd/tests/apiextensions...mysqlclusters...yaml | CRD test fixture update for status.systemUserRotation. |
| charts/moco/templates/generated/crds/moco_crds.yaml | Helm-generated CRD update for status.systemUserRotation. |
| clustering/mock_test.go | Updates mock operator to satisfy the extended dbop.Operator interface. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d05bc9d to
a6349f9
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 24 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
aeee15d to
c630055
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 24 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
pkg/dbop/password.go:59
- Missing unit tests for the password rotation SQL operations. The RotateUserPassword and DiscardOldPassword methods lack direct unit test coverage. While the validateMocoUser function provides SQL injection protection, this critical security-sensitive code should have dedicated unit tests that verify:
- The validateMocoUser whitelist correctly blocks invalid user names
- The SQL statement construction is correct
- Error handling works as expected
Consider adding a pkg/dbop/password_test.go file with unit tests for these functions, using mocks or test doubles for the database connection.
package dbop
import (
"context"
"fmt"
"github.com/cybozu-go/moco/pkg/constants"
)
// validMocoUsers is a set of allowed user names for password rotation SQL operations.
var validMocoUsers map[string]struct{}
func init() {
validMocoUsers = make(map[string]struct{}, len(constants.MocoUsers))
for _, u := range constants.MocoUsers {
validMocoUsers[u] = struct{}{}
}
}
// validateMocoUser returns an error if user is not one of the fixed system user
// names. This guards against SQL injection since the user name is interpolated
// directly into ALTER USER statements.
func validateMocoUser(user string) error {
if _, ok := validMocoUsers[user]; !ok {
return fmt.Errorf("invalid MOCO user name %q: must be one of constants.MocoUsers", user)
}
return nil
}
// RotateUserPassword executes ALTER USER ... IDENTIFIED BY ... RETAIN CURRENT PASSWORD.
//
// user must be one of the fixed system user names defined in pkg/constants/users.go.
// The user name is interpolated directly into the SQL statement because MySQL
// does not support placeholders in the user position of ALTER USER.
func (o *operator) RotateUserPassword(ctx context.Context, user, newPassword string) error {
if err := validateMocoUser(user); err != nil {
return err
}
query := fmt.Sprintf("ALTER USER '%s'@'%%' IDENTIFIED BY ? RETAIN CURRENT PASSWORD", user)
if _, err := o.db.ExecContext(ctx, query, newPassword); err != nil {
return fmt.Errorf("failed to rotate password for %s: %w", user, err)
}
return nil
}
// DiscardOldPassword executes ALTER USER ... DISCARD OLD PASSWORD.
//
// user must be one of the fixed system user names defined in pkg/constants/users.go.
// See RotateUserPassword for the rationale.
func (o *operator) DiscardOldPassword(ctx context.Context, user string) error {
if err := validateMocoUser(user); err != nil {
return err
}
query := fmt.Sprintf("ALTER USER '%s'@'%%' DISCARD OLD PASSWORD", user)
if _, err := o.db.ExecContext(ctx, query); err != nil {
return fmt.Errorf("failed to discard old password for %s: %w", user, err)
}
return nil
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c630055 to
85d6ba5
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 24 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
85d6ba5 to
91855b9
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 24 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
docs/designdoc/password_rotation.md:196
- The design doc says the replicas=0 discard refusal returns RequeueAfter: 30s, but the controller implementation uses rotationRequeueInterval (15s) for that case. Please align the documentation with the actual behavior, or adjust the code to match the documented 30s interval.
**Scaled-down clusters (replicas=0):**
When `spec.replicas` is 0, discard is **rejected** with a Warning Event (`DiscardRefused`) and the handler returns `RequeueAfter: 30s`.
Without running Pods, we cannot verify that the new passwords actually work, so discarding old passwords would be unsafe.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ee58742 to
5a02f10
Compare
20f9ea6 to
11e8944
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 25 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (3)
cmd/kubectl-moco/cmd/credential.go:28
- The deprecated form
kubectl moco credential CLUSTER_NAMEstill calls fetchCredential, but the--mysql-user/--formatflags are now registered only on theshowsubcommand. That breaks previously valid flag usage even though the command is only marked deprecated. Consider moving these flags to credentialCmd as PersistentFlags (or duplicating them) so both forms accept the same options during the deprecation period.
RunE: func(cmd *cobra.Command, args []string) error {
if len(args) == 1 {
fmt.Fprintln(cmd.ErrOrStderr(), "Warning: 'kubectl moco credential CLUSTER_NAME' is deprecated, use 'kubectl moco credential show CLUSTER_NAME' instead.")
return fetchCredential(cmd.Context(), args[0])
controllers/password_rotation.go:570
- For replica instances, this disables super_read_only via SetSuperReadOnly before any session has sql_log_bin=0 set. If the goal is to ensure no rotation-related writes are binlogged (like ALTER USER), consider executing the super_read_only toggle under sql_log_bin=0 too (e.g., have SetSuperReadOnly use a dedicated conn that sets sql_log_bin=0 first, or add a helper that runs both statements in one session).
if isReplica {
if err := op.SetSuperReadOnly(ctx, false); err != nil {
return fmt.Errorf("failed to disable super_read_only on instance %d: %w", instanceIndex, err)
}
controllers/password_rotation.go:614
- Same concern here: discardInstanceUsers disables super_read_only before any sql_log_bin=0 is set on that session. Consider ensuring the super_read_only toggle is executed with binlogging disabled as well, to keep cross-cluster replication safety guarantees consistent.
if isReplica {
if err := op.SetSuperReadOnly(ctx, false); err != nil {
return fmt.Errorf("failed to disable super_read_only on instance %d for discard: %w", instanceIndex, err)
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
5fd9097 to
a04b51d
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 25 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
bf82a56 to
ef3cb0e
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 25 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (3)
cmd/kubectl-moco/cmd/credential.go:191
- The deprecated path (
kubectl moco credential CLUSTER_NAME) no longer accepts the previously supported-u/--mysql-userand--formatflags because those flags are defined only on theshowsubcommand. This makes the deprecation a breaking change for existing scripts. Consider adding these flags as persistent flags oncredentialCmd(bound to the same config variables) so the deprecated invocation keeps working during the deprecation period.
func init() {
fs := credentialShowCmd.Flags()
fs.StringVarP(&credentialConfig.user, "mysql-user", "u", constants.ReadOnlyUser, "User for login to mysql")
fs.StringVar(&credentialConfig.format, "format", "plain", "The format of output [`plain` or `mycnf`]")
cmd/kubectl-moco/cmd/credential.go:164
- If
moco.cybozu.com/password-discardannotation is already set but contains the wrong rotationID (e.g., manually set or stale), this command prints "already requested" and exits without fixing it. That leaves the user waiting while the controller will later consume the annotation as mismatched. Consider detectingann != rotationStatus.RotationIDand either overwriting it to the active rotationID or returning a clear error instructing the user to remove/re-run.
if ann := cluster.Annotations[constants.AnnPasswordDiscard]; ann != "" {
fmt.Printf("Password discard is already requested (rotationID: %s).\n", ann)
return nil
}
pkg/dbop/password.go:39
SetSuperReadOnlyexecutesSET GLOBAL super_read_only=...via the pooled DB handle without settingsql_log_bin=0on that session. For consistency with the rotation design’s binlog-safety goals, consider using a dedicated connection and settingsql_log_bin=0in the same session (similar to RotateUserPassword/DiscardOldPassword), or document why this statement is guaranteed not to be written to the binlog.
func (o *operator) SetSuperReadOnly(ctx context.Context, on bool) error {
val := 0
if on {
val = 1
}
if _, err := o.db.ExecContext(ctx, fmt.Sprintf("SET GLOBAL super_read_only=%d", val)); err != nil {
return fmt.Errorf("failed to set super_read_only=%d: %w", val, err)
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
edcfc66 to
80fcc6f
Compare
Implement password rotation for all 8 MOCO system users via MySQL's RETAIN CURRENT PASSWORD / DISCARD OLD PASSWORD mechanism. Users trigger rotation with kubectl annotations (password-rotate / password-discard), and the controller manages the full lifecycle: generate pending passwords, ALTER USER with RETAIN, distribute new secrets, then DISCARD old passwords. Key safety properties: - All ALTER USER operations use SET sql_log_bin=0 (via dedicated db.Conn) to prevent binlog propagation to cross-cluster replicas - RETAIN is executed on every instance individually (primary + all replicas) since binlog-based within-cluster replication is disabled - Crash recovery uses HasDualPassword to query MySQL directly for additional_password, skipping users that already have RETAIN applied. This makes MySQL the source of truth and avoids the race where RETAIN succeeds but a Status.Patch fails - For replicas, super_read_only is temporarily disabled before ALTER USER and re-enabled afterwards; clustering loop self-recovers on failure - Pending passwords stored in source Secret with ROTATION_ID for integrity validation - ConfirmPendingPasswords is idempotent (no-op when pending already cleared) - Annotation removal is best-effort to avoid Patch conflicts with Status - Discard is gated on StatefulSet rollout completion to ensure all Pods have picked up new passwords via EnvFrom before old passwords are dropped - Discard is rejected when replicas=0 (scaled-down cluster) because new passwords cannot be verified without running Pods Signed-off-by: shunki-fujita <shunki-fujita@cybozu.co.jp>
80fcc6f to
47b5fb9
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 25 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (2)
cmd/kubectl-moco/cmd/credential.go:33
- The deprecated invocation path (
kubectl moco credential CLUSTER_NAME) no longer supports the old-u/--mysql-userand--formatflags because they are now defined only on theshowsubcommand. If backwards compatibility is intended during the deprecation window, consider defining these as persistent flags oncredentialCmd(or otherwise wiring them so the deprecated path continues to accept the same flags).
var credentialCmd = &cobra.Command{
Use: "credential",
Short: "Manage credentials for MySQLCluster",
Long: "The credential command is used to show, rotate, and discard credentials for MySQLCluster.",
Args: cobra.MaximumNArgs(1),
RunE: func(cmd *cobra.Command, args []string) error {
if len(args) == 1 {
fmt.Fprintln(cmd.ErrOrStderr(), "Warning: 'kubectl moco credential CLUSTER_NAME' is deprecated, use 'kubectl moco credential show CLUSTER_NAME' instead.")
return fetchCredential(cmd.Context(), args[0])
}
return cmd.Help()
},
}
controllers/mysqlcluster_controller_test.go:2424
- This assertion only checks that each user appears at least once in a flat
rotatedUsersmap. Since rotation must run on every instance, this test can still pass even if the reconciler only rotates users on a single instance. Consider tracking/asserting RotateUserPassword calls per (instance,user) pair (or assertinginstanceDualfor every instance and every user) so the test would fail if any instance is skipped.
By("verifying ALTER USER RETAIN was called for all users")
Eventually(func() int {
return len(mockOpFactory.getRotatedUsers())
}).Should(Equal(len(constants.MocoUsers)))
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 26 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (2)
controllers/password_rotation.go:233
- In handlePasswordRotate, pending keys may be generated and written to the controller Secret before verifying the Secret is a valid MOCO password Secret (e.g. correct moco.cybozu.com/secret-version annotation). If the Secret format is broken (manual edit/restore), this can create additional inconsistent *_PENDING/ROTATION_ID state and then fail later when NewMySQLPasswordFromSecret is called. Consider validating the controller Secret format up front (before SetPendingPasswords/Update) and returning an error without mutating the Secret when it’s invalid.
// Step 1: Ensure pending passwords exist in the source Secret.
// SetPendingPasswords is idempotent: if pending passwords already exist for the
// current rotationID, they are reused without regeneration.
sourceSecret := &corev1.Secret{}
if err := r.Get(ctx, client.ObjectKey{Namespace: r.SystemNamespace, Name: cluster.ControllerSecretName()}, sourceSecret); err != nil {
return err
}
hasPending, err := password.HasPendingPasswords(sourceSecret, rotation.RotationID)
if err != nil {
r.Recorder.Eventf(cluster, corev1.EventTypeWarning, "StaleRotationPending",
"Failed to verify pending passwords in source secret %s: %v. "+
"To recover, manually delete *_PENDING and ROTATION_ID keys from the secret: "+
"kubectl -n %s edit secret %s",
cluster.ControllerSecretName(), err, r.SystemNamespace, cluster.ControllerSecretName())
return fmt.Errorf("failed to verify pending passwords in source secret: %w", err)
}
if !hasPending {
if _, err := password.SetPendingPasswords(sourceSecret, rotation.RotationID); err != nil {
return err
}
if err := r.Update(ctx, sourceSecret); err != nil {
return err
}
log.Info("generated pending passwords", "rotationID", rotation.RotationID)
cmd/kubectl-moco/cmd/credential.go:220
- The deprecated form
kubectl moco credential CLUSTER_NAMEnow runs the parent command, but the--mysql-user/-uand--formatflags are registered only on theshowsubcommand. This breaks previously valid invocations likekubectl moco credential -u moco-admin <cluster>(unknown flag). Consider making these persistent flags on the parent (or duplicating them) so the deprecated syntax remains functional while users migrate.
fs := credentialShowCmd.Flags()
fs.StringVarP(&credentialConfig.user, "mysql-user", "u", constants.ReadOnlyUser, "User for login to mysql")
fs.StringVar(&credentialConfig.format, "format", "plain", "The format of output [`plain` or `mycnf`]")
_ = credentialShowCmd.RegisterFlagCompletionFunc("mysql-user", func(cmd *cobra.Command, args []string, toComplete string) ([]string, cobra.ShellCompDirective) {
return []string{constants.ReadOnlyUser, constants.WritableUser, constants.AdminUser}, cobra.ShellCompDirectiveDefault
})
_ = credentialShowCmd.RegisterFlagCompletionFunc("format", func(cmd *cobra.Command, args []string, toComplete string) ([]string, cobra.ShellCompDirective) {
return []string{"plain", "mycnf"}, cobra.ShellCompDirectiveDefault
})
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f686090 to
f07c936
Compare
Add MigrateUserAuthPlugin to migrate system users to the target authentication plugin (determined from @@global.authentication_policy) during Phase 2. After DISCARD OLD PASSWORD, each user is re-identified with ALTER USER ... IDENTIFIED WITH <plugin> BY ..., re-hashing the password under the new plugin. Both steps are idempotent. Signed-off-by: shunki-fujita <shunki-fujita@cybozu.co.jp>
f07c936 to
e03917d
Compare
Close: #849
Implement password rotation for all 8 MOCO system users via MySQL's RETAIN CURRENT PASSWORD / DISCARD OLD PASSWORD mechanism. Users trigger rotation with kubectl annotations (password-rotate / password-discard), and the controller manages the full lifecycle: generate pending passwords, ALTER USER with RETAIN, distribute new secrets, then DISCARD old passwords.