Resolve the issue of concurrent changes during GC#1729
Conversation
This change accounts for the case where the repo is updated during a GC run. - [ ] Do the same for expiration
|
I don't know how to test this. Our options are,
|
icechunk/src/ops/gc.rs
Outdated
| let mut attempts: u64 = 1; | ||
| loop { | ||
| match garbage_collect_one_attempt( | ||
| Arc::clone(&asset_manager), | ||
| config, | ||
| num_updates_per_repo_info_file, | ||
| ) | ||
| .await | ||
| { | ||
| Ok(res) => { | ||
| return Ok(res); | ||
| } | ||
| Err(GCError::Repository(RepositoryError { | ||
| kind: RepositoryErrorKind::RepoInfoUpdated, | ||
| .. | ||
| })) => match backoff.next() { | ||
| Some(delay) => { | ||
| info!( | ||
| attempts, | ||
| ?delay, | ||
| "Repo info object was updated while GC was running, retrying with backoff..." | ||
| ); | ||
| tokio::time::sleep(delay).await; | ||
| attempts += 1; | ||
| } | ||
| None => { | ||
| return Err(GCError::Repository( | ||
| RepositoryErrorKind::RepoUpdateAttemptsLimit(max_attempts as u64) | ||
| .into(), | ||
| )); | ||
| } | ||
| }, | ||
| Err(err) => { | ||
| return Err(err); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
This could be garbage_collect_one_attempt.retry(backoff).map_err(|e| GCError::... .into())?? https://docs.rs/backon/latest/backon/struct.Retry.html
There was a problem hiding this comment.
just changed it, so much better! I copied the code from the wrong place 🤦
icechunk/src/ops/gc.rs
Outdated
| } | ||
|
|
||
| /// Updates the repo object eliminating snapshots | ||
| /// Returns true if the operation was successful, if it returns false, GC should be retried |
There was a problem hiding this comment.
| /// Returns true if the operation was successful, if it returns false, GC should be retried | |
| /// Returns Ok() if the operation was successful, if it returns Err(), GC should be retried |
?
icechunk/src/ops/gc.rs
Outdated
| && drop_snapshots.contains(parent) | ||
| { | ||
| // this is a new snapshot created since we started GC | ||
| // but we are traying to drop its parent. Case 2b |
There was a problem hiding this comment.
| // but we are traying to drop its parent. Case 2b | |
| // but we are trying to drop its parent. Case 2b |
There was a problem hiding this comment.
intentional, to prove it's not Claude
icechunk/src/ops/gc.rs
Outdated
| // a new snapshot with the root as parent, | ||
| // root is always retained |
There was a problem hiding this comment.
not just root no? If the parent is in keep_snapshots, we hit this branch, correct?
There was a problem hiding this comment.
ups, right, adjusting the comment
| if !final_snap_ids.contains(&pointed_snap) { | ||
| return Err(RepositoryErrorKind::RepoInfoUpdated.into()); | ||
| } | ||
| } |
There was a problem hiding this comment.
IIUC we are ignoring an update where a tag is deleted and a snapshot can be GC-ed. But that's quite minor.
There was a problem hiding this comment.
exactly, there are a few cases like that that will have to be GC'ed in the next pass. I only want to make sure I don't delete something I shouldn't, extra garbage is fine.
|
|
||
| let _ = asset_manager.update_repo_info(retry_settings, do_update).await?; | ||
| let retry_settings = storage::RetriesSettings { | ||
| max_tries: Some(NonZeroU16::MIN), |
There was a problem hiding this comment.
But seriously, why not use retry_settings?
There was a problem hiding this comment.
I'm retrying outside of this, this has to do a single attempt and fail immediately
| /// Since expire_v2 is a relatively fast operation (repo object only) we retry it if the repo info | ||
| /// object was modified since it started |
There was a problem hiding this comment.
| /// Since expire_v2 is a relatively fast operation (repo object only) we retry it if the repo info | |
| /// object was modified since it started |
icechunk/src/ops/gc.rs
Outdated
|
|
||
| let mut attempts: u64 = 1; | ||
| loop { | ||
| match expire_v2_one_attempt( |
There was a problem hiding this comment.
same here, could probably do expire_...().retry(backoff)...
shuttle now supports tokio apparently but I haven't tried it: awslabs/shuttle#238 |
This change accounts for the case where the repo is updated during a GC run.