Skip to content

feat: retry-logic#11

Open
sveitser wants to merge 6 commits intoalxiong:mainfrom
EspressoSystems:main
Open

feat: retry-logic#11
sveitser wants to merge 6 commits intoalxiong:mainfrom
EspressoSystems:main

Conversation

@sveitser
Copy link
Contributor

  • Retry
  • Concurrent download de-duplication
  • Nix flake
  • Version bump

mrain and others added 6 commits September 15, 2025 12:37
Use a .part file as both temp file for atomic rename and flock target
to deduplicate concurrent downloads. Retries with exponential backoff
on transient network errors and zero-byte responses.
Copy link
Owner

@alxiong alxiong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK the robustness improvement and dedup-concurrent tests look good.

only minor nits.

Ok(T::deserialize_uncompressed_unchecked(&bytes[..])?)
}

pub(crate) fn download_url_to_file(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add code doc.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to force nix usage. this package is simple enough to have any setup or lib dependency.

most systems should have openssl installed, pure cargo toolchain should be sufficient?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds retry and concurrency-safe de-duplication to SRS downloads, introduces Nix flake-based dev environment support, and bumps the crate version.

Changes:

  • Add download_url_to_file with retry/backoff and file-lock-based de-duplication for concurrent downloads.
  • Add unit tests covering download success/failure, retries, and concurrent de-dup behavior.
  • Add flake.nix/flake.lock and bump crate version + dependencies (fs2, mockito).

Reviewed changes

Copilot reviewed 3 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/load.rs Implements retrying download + concurrent de-dup via .part file locking; adds tests.
flake.nix Adds Nix flake dev shell for consistent Rust tooling + deps.
flake.lock Pins Nix inputs for reproducible dev environments.
Cargo.toml Bumps version; adds fs2 and test dependency mockito.
Cargo.lock Updates lockfile for new dependencies and version bump.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -1,21 +1,18 @@
//! Utils for persisting serialized data to files and loading them into memroy.
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: module doc comment says "memroy"; should be "memory".

Suggested change
//! Utils for persisting serialized data to files and loading them into memroy.
//! Utils for persisting serialized data to files and loading them into memory.

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +65
create_dir_all(dest.parent().context("no parent dir")?)
.context("Unable to create directory")?;

// .part file serves double duty: temp file for atomic rename and flock
// target to deduplicate concurrent downloads. Not truncated on open so a
// second opener doesn't clobber an in-progress write. Left on disk after
// failure -- harmless, the next caller overwrites it.
let part_path = {
let mut p = dest.as_os_str().to_owned();
p.push(".part");
PathBuf::from(p)
};
let part_file = OpenOptions::new()
.write(true)
.create(true)
.truncate(false)
.open(&part_path)?;
part_file.lock_exclusive()?;

// Another thread may have completed the download while we blocked on the lock.
if dest.exists() {
return Ok(());
}
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

download_url_to_file creates/opens the .part file before checking dest.exists(). If dest already exists and .part does not, this leaves behind an empty .part file even though no download occurs. Consider adding a fast-path if dest.exists() { return Ok(()) } before opening/creating .part (and keep the post-lock recheck for races), or explicitly remove the .part file on the early-return path.

Copilot uses AI. Check for mistakes.
Comment on lines +69 to +79
let mut buf: Vec<u8> = Vec::new();
match ureq::get(url).call() {
Ok(resp) => match resp.into_reader().read_to_end(&mut buf) {
Ok(_) if buf.is_empty() => {
last_err = Some(anyhow!("zero-byte response"));
},
Ok(_) => {
part_file.set_len(0)?;
(&part_file).write_all(&buf)?;
fs::rename(&part_path, dest)?;
return Ok(());
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The download path buffers the entire response into a Vec<u8> before writing to disk. For large SRS assets this can cause very high peak memory usage (and repeats the allocation on each retry). Prefer streaming the response reader directly into the .part file (and optionally hashing/size-checking as you stream) to keep memory bounded.

Copilot uses AI. Check for mistakes.
Comment on lines +68 to +90
for attempt in 0..=max_retries {
let mut buf: Vec<u8> = Vec::new();
match ureq::get(url).call() {
Ok(resp) => match resp.into_reader().read_to_end(&mut buf) {
Ok(_) if buf.is_empty() => {
last_err = Some(anyhow!("zero-byte response"));
},
Ok(_) => {
part_file.set_len(0)?;
(&part_file).write_all(&buf)?;
fs::rename(&part_path, dest)?;
return Ok(());
},
Err(e) => last_err = Some(anyhow::Error::from(e)),
},
Err(e) => last_err = Some(anyhow::Error::from(e)),
}

if attempt < max_retries {
let backoff = base_backoff * 2u32.saturating_pow(attempt as u32);
thread::sleep(backoff);
}
}
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function holds an exclusive file lock across the entire network request and any backoff sleeps. Since ureq::get(url).call() has no explicit connect/read timeout here, a hung download can block all concurrent callers indefinitely. Consider using a configured ureq::Agent with reasonable timeouts (and/or a maximum total elapsed time) so lock-holding failures resolve predictably.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants