-
Notifications
You must be signed in to change notification settings - Fork 47
Fix race condition in concurrent artifact additions #483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
✅ A new PR has been created in buildah to vendor these changes: containers/buildah#6526 |
|
Packit jobs failed. @containers/packit-build please check. |
This fixes a race condition where concurrent 'podman artifact add' commands for different artifacts would result in only one artifact being created, without any error messages. The root cause was in the artifact store's Add() method, which would: 1. Acquire lock 2. Read OCI layout index 3. Create ImageDestination (which snapshots the index) 4. RELEASE lock (optimization for blob copying) 5. Copy blobs (while unlocked) 6. Reacquire lock 7. Commit changes (write new index) When two concurrent additions happened: - Process A: Lock → Read index → Create dest A → Unlock → Copy blobs - Process B: Lock → Read index (no artifact A!) → Create dest B → Unlock - Process A: Lock → Commit (write index with A) - Process B: Lock → Commit (write index with B, OVERWRITING A) The fix keeps the lock held for the entire operation. While this reduces concurrency for blob copying, it prevents the index file corruption that caused artifacts to be lost. Changes: - Remove lock release/reacquire around blob copying in store.Add() - Simplify lock management (no more conditional unlock) Fixes: containers/podman#27569 Generated-with: Cursor AI Signed-off-by: Daniel J Walsh <[email protected]>
mtrmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
The diagnosis + fix is correct. Are we fine with paying the price? Earlier containers/podman#26524 , Cc: @mheon @baude @Luap99 .
Alternatively, it should be possible for oci/layout/newImageDestination to defer reading the index.json until the start of the “write manifests” phase; that would avoid the need to hold the lock for the duration of blob copies — with a fairly major downside that this would (again) make pkg/libartifact have a very remote dependency on internal implementation choices of oci/layout`.
|
|
||
| as.lock.Lock() | ||
| locked = true | ||
| // Lock is still held from the beginning of the function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment does not make much sense stand-alone for future readers of the code, without seeing it in this diff.
|
yes, ref containers/podman#27574 It would be much better to fix this right so that we reload index.json after the relock. It means redoing the conflict check again. |
This fixes a race condition where concurrent 'podman artifact add' commands for different artifacts would result in only one artifact being created, without any error messages.
The root cause was in the artifact store's Add() method, which would:
When two concurrent additions happened:
The fix keeps the lock held for the entire operation. While this reduces concurrency for blob copying, it prevents the index file corruption that caused artifacts to be lost.
Changes:
Fixes: containers/podman#27569
Generated-with: Cursor AI