-
Notifications
You must be signed in to change notification settings - Fork 156
Description
Report
I am working on deploying a MongoDB cluster using Helm, with the intention of managing it via ArgoCD. For handling user secrets, I use the External Secrets Operator (ESO). Since ESO fetches secrets from an external provider and then creates the corresponding Kubernetes Secret, there's an inherent delay in secret availability.
More about the problem
I'm encountering a race condition where PSMDB attempts to verify the presence of the user secret. If the secret is not yet available, it proceeds to create a new one. I attempted to mitigate this using Helm hooks, but in most cases, PSMDB still acts faster than ESO.
Initially, I considered modifying the operator code for users secret. However, I noticed that CheckNSetDefaults is invoked before reconcileUsersSecret. This function populates fields with default values, rendering my condition check ineffective.
Notes related to users secret:
Even though the operator eventually restarts the backup-agent
container upon detecting a change in the secret, we're still facing timing issues related to Kubernetes secret synchronization. In some cases, despite the container restart, the updated secret value is not properly propagated, leading to continuous PBM authentication errors.
Notes related to SSL certificates:
I realized that a similar race condition occurs when specifying the ssl and sslInternal fields. If cert-manager is slower than PSMDB, the operator may attempt to create an issuer, even though the referenced certificates have already been issued.
Notes related to internal key:
In most cases, I’ve noticed that the MongoDB internal key is still created by the operator, even when I’ve explicitly referenced it (race condition here as well).
Summary:
I ended up changing the behavior of the spec.secrets
fields. If any secret is explicitly defined, it is now treated as an externally managed, pre-existing secret. The operator will no longer attempt to create a Kubernetes Secret with that name. With this change, I’ve observed that cluster creation is faster, as there's no need to update the internal secret or propagate changes to the MongoDB nodes - which sometimes didn't work properly with the original logic causing authentication errors.
Steps to reproduce
- Create all the secrets with External Secrets Operator
- Create cert-manager.io/Certificate
- Fill
spec.secrets
with the name of the objects created - Read the operator logs and later MongoDB nodes
Versions
- Kubernetes: 1.30
- Operator: 1.20.1
- Database: v7
Anything else?
Related: https://forums.percona.com/t/mongodb-operator-creates-overwrites-external-secret-by-it-self/14794/5