CLOUDP-389867: add delay to backupConfig sharded cluster by filipcirtog · Pull Request #935 · mongodb/mongodb-kubernetes

filipcirtog · 2026-03-24T16:57:30Z

HELP

HELP-87476 - This Jira ticket addresses a race condition occurring when enabling backups for a sharded cluster deployed with the Kubernetes operator. The issue arises as individual shards may not receive 'addShard' events, leading to their indefinite inactivity. Investigations are focused on identifying the race condition in Ops Manager and finding a solution to ensure all shards are included in backups without delay.

Summary

When enabling backup on a sharded cluster, Ops Manager needs time to complete its internal topology discovery before it can successfully accept a backup request. Without a delay, the operator races against OM's discovery, causing backup enablement to fail and triggering reconciliation loops until a retry eventually succeeds.

This race is specific to sharded clusters due to their multi-process topology (mongos + config servers + shards), which takes longer for OM to fully register compared to replica sets.

Proof of Work

A configurable sleep is inserted in updateOmDeploymentShardedCluster immediately before calling ensureBackupConfigurationAndUpdateStatus, but only when a backup spec is present. The delay defaults to 60 seconds and is controlled by the MDB_BACKUP_START_DELAY_SECONDS environment variable on the operator deployment, allowing users to tune or disable it per environment.

Checklist

Have you linked a jira ticket and/or is the ticket in the title?
Have you checked whether your jira ticket required DOCSP changes?
Have you added changelog file?
- use skip-changelog label if not needed
- refer to Changelog files and Release Notes section in CONTRIBUTING.md for more details

github-actions · 2026-03-24T16:58:34Z

⚠️ (this preview might not be accurate if the PR is not rebased on current master branch)

MCK 1.7.1 Release Notes

Bug Fixes

MongoDBOpsManager: Correctly handle the edge case where -admin-key was created by user and malformed. Previously the error was only presented in DEBUG log entry.
MongoDBOpsManager: Improved readiness probe error handling and appDB agent status logging

Other Changes

Container images: Merged the init-database and init-appdb init container images into a single init-database image. The init-appdb image will no longer be published and does not affect existing deployments.
- The following Helm chart values have been removed: initAppDb.name, initAppDb.version, and registry.initAppDb. Use initDatabase.name, initDatabase.version, and registry.initDatabase instead.
- The following environment variables have been removed: INIT_APPDB_IMAGE_REPOSITORY and INIT_APPDB_VERSION. Use INIT_DATABASE_IMAGE_REPOSITORY and INIT_DATABASE_VERSION instead.
Helm Chart: Removed operator.baseName Helm value. This value was never intended to be consumed by operator users and was never documented. The value controls the prefix for workload RBAC resource names (mongodb-kubernetes default), but changing it could break the operator and workloads because the operator is not aware of custom prefixes. With this change, the Helm chart will no longer allow customisation and the relevant resources will be deployed with predefined names (ServiceAccount with names mongodb-kubernetes-appdb, mongodb-kubernetes-database-pods, mongodb-kubernetes-ops-manager, Role with name mongodb-kubernetes-appdb and RoleBinding with name mongodb-kubernetes-appdb).

implementation

a131e7d

filipcirtog added the skip-changelog Use this label in Pull Request to not require new changelog entry file label Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLOUDP-389867: add delay to backupConfig sharded cluster#935

CLOUDP-389867: add delay to backupConfig sharded cluster#935
filipcirtog wants to merge 1 commit intomasterfrom
CLOUDP-389867/add-delay-to-backupConfig-sharded-cluster

filipcirtog commented Mar 24, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

filipcirtog commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

HELP

Summary

Proof of Work

Checklist

Uh oh!

github-actions bot commented Mar 24, 2026

MCK 1.7.1 Release Notes

Bug Fixes

Other Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

filipcirtog commented Mar 24, 2026 •

edited

Loading