-
Notifications
You must be signed in to change notification settings - Fork 38
Description
Summary
The MongoDB::Atlas::ProjectIpAccessList resource uses a delete-all-then-recreate-all strategy during updates, which causes:
- Potential connectivity downtime during deployments
- Race conditions when deploying to multiple regions simultaneously
Current Behavior
Looking at the update handler source code, the update operation:
- Deletes ALL entries (both previous and current model entries)
- Recreates only the entries in the current configuration
entriesToDelete := currentModel.AccessList
entriesToDelete = append(entriesToDelete, prevModel.AccessList...)
progressEvent := deleteEntriesForUpdate(entriesToDelete, ...)Expected Behavior
The update handler should compute a diff and only:
- Delete entries that were removed from the configuration
- Add entries that are new
- Update comments on unchanged IPs (if applicable)
This would be atomic with respect to unchanged entries.
Impact
1. Downtime Window
During the deletion phase, the IP access list is temporarily empty or incomplete, blocking legitimate connections until recreation completes.
2. Race Condition with Multi-Region Deployments
We experienced a critical issue when running two CDK deployments simultaneously in different AWS regions. Both deployments included a shared IP access list entry:
{
cidrBlock: vpcCidrBlock,
comment: `${deployEnvironment} CIDR (${this.region})`,
}The delete-all-recreate-all strategy caused a race condition where both deployments were deleting and recreating entries concurrently. Here's the MongoDB Atlas Activity Feed showing the race condition:
| Timestamp | Action | IP/CIDR | User |
|---|---|---|---|
| 11/26/25 - 01:09:33 PM | Added | 10.41.0.0/16 | iyawkvot |
| 11/26/25 - 01:09:32 PM | Removed | 10.41.0.0/16 | iyawkvot |
| 11/26/25 - 01:09:28 PM | Removed | 10.41.0.0/16 | iyawkvot |
| 11/26/25 - 01:09:24 PM | Added | 10.41.0.0/16 | iyawkvot |
| 11/26/25 - 01:09:24 PM | Removed | 10.41.0.0/16 | iyawkvot |
The entry was added, removed, added again, and removed multiple times within seconds due to the concurrent deployments fighting over the same shared resource.
Result: The VPC CIDR entry ended up being deleted, breaking connectivity for services in that VPC.
Suggested Fix
Implement a diff-based update strategy:
- Compute entries to add (in current model but not in Atlas)
- Compute entries to remove (in Atlas but not in current model)
- Only delete removed entries
- Only add new entries
- Leave unchanged entries untouched
This would eliminate both the downtime window and the race condition issue.
Environment
- Using AWS CDK with
awscdk-resources-mongodbatlas - Multi-region deployments (eu-north-1, eu-west-1, etc.)