Skip to content

Conversation

@Bravo555
Copy link
Member

@Bravo555 Bravo555 commented Jan 26, 2026

Proposed changes

In current download implementation, where in case of interruption we try to resume the download using a HTTP range request to avoid re-downloading parts we already downloaded, there is a potential issue where if the file is updated by the server between retries, we can miss this and corrupt the file.

This PR adds a check where if the file is modified, we abort the range request and request the full file again with a normal GET request.

In particular, the check if file was modified compares ETag header value if it exists, between current and previous request. If it is different than ETag from previous request, we request full range of the file again. If there's no ETag, behaviour is unchanged and we proceed with the range request, so as to not disable partial requests for servers that don't send an ETag, for example Cumulocity Inventory Binaries.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue


Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s. You can activate automatic signing by running just prepare-dev once)
  • I ran just format as mentioned in CODING_GUIDELINES
  • I used just check as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

One more thing we could do is check if the Content-Length of the resource is the same, hopefully catching some updates where size of the file changes. Unfortunately currently I am unable to test this because to test partial requests we use chunked transfer encoding, and when it's in use, reqwest ignores Content-Length header and doesn't report it (maybe idea being that chunked transfer encoding is used inherently for streaming requests, where you most often don't know the size ahead of time). So currently, this check is not added, and we only check ETag if it's present.

Some servers also send a Last-Modified header, which can be added (unfortunately Cumulocity also doesn't support it).

There's also option to use Want-Content-Digest header to request a digest from the server, which we could then also compute locally to perform an integration check before finishing the download, but so far I haven't seen a server that supports it.

@Bravo555 Bravo555 force-pushed the fix/partial-download-verify-etag branch from 4fd77e9 to 6b3b3a4 Compare January 26, 2026 09:58
@Bravo555 Bravo555 temporarily deployed to Test Pull Request January 26, 2026 09:58 — with GitHub Actions Inactive
@Bravo555 Bravo555 changed the title fix: partial download verify etag fix: Disable partial download if file being downloaded is modified between retries Jan 26, 2026
@codecov
Copy link

codecov bot commented Jan 26, 2026

Codecov Report

❌ Patch coverage is 84.88372% with 13 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/common/download/src/download.rs 76.31% 1 Missing and 8 partials ⚠️
...s/common/download/src/download/partial_response.rs 91.66% 3 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 26, 2026

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
825 0 3 825 100 2h36m26.889894999s

@Bravo555 Bravo555 force-pushed the fix/partial-download-verify-etag branch from 6b3b3a4 to 783fb5a Compare January 26, 2026 14:32
@Bravo555 Bravo555 force-pushed the fix/partial-download-verify-etag branch from 783fb5a to 4d81a65 Compare January 26, 2026 15:44
@Bravo555 Bravo555 force-pushed the fix/partial-download-verify-etag branch from 4d81a65 to a0547c7 Compare January 26, 2026 15:45
@Bravo555 Bravo555 temporarily deployed to Test Pull Request January 26, 2026 15:45 — with GitHub Actions Inactive
@Bravo555 Bravo555 marked this pull request as ready for review January 26, 2026 16:05
Comment on lines 202 to 203
let mut response = self.request_range_from(url, offset).await?;
if was_resource_modified(&response, &prev_response) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing wrong with this impl. But, instead of this manual check and adjusting the start position, using the IfRange header along with the ETag seems like the "accepted convention" to resume a partial download. Avoids that additional HTTP call as well (not that it really matters in the context of a file download).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IfRange indeed is something that I wanted to add as well, but in a follow-up PR.

Comment on lines 376 to 379
(None, None) => {
// no etags in either request, assume resource is unchanged
false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be safer to assume the contrary. But, I also understand this choice: if the source doesn't even try to provide support to detect changes, then why bother?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, particularly with downloading files from Cumulocity where ETag is not given, I didn't want to disable partial downloads there, as people might be surprised why it stopped working... But in cases where ETag is actually available we shouldn't ignore it.

@Bravo555 Bravo555 force-pushed the fix/partial-download-verify-etag branch from a0547c7 to 7187df8 Compare January 27, 2026 09:41
Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the impl though there are a few minor things to fix.

@Bravo555 Bravo555 temporarily deployed to Test Pull Request January 29, 2026 12:33 — with GitHub Actions Inactive
@Bravo555
Copy link
Member Author

There was one additional issue where if the file was modified but server returned full response anyway, we would discard the request and retry instead of downloading the new response.
Fixed in 00efec1 by applying was_resource_modified check only if response is partial content.

Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-confirming my approval for the updated logic.

use reqwest::StatusCode;
use reqwest::{header, Response};

pub(super) enum PartialResponse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional:

Suggested change
pub(super) enum PartialResponse {
pub(super) enum RangeResponse {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO better to keep the name PartialResponse as it's related to the Partial Content status code.

Comment on lines +37 to +40
StatusCode::PARTIAL_CONTENT => {
if was_resource_modified(response, prev_response) {
return Ok(PartialResponse::ResourceModified);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, I thought this inefficiency (where the already fetched partial content has to be discarded after the Etag check) could be completely avoided with the usage of If-Range header. But I just realised that a server supporting ETag doesn't guarantee that it supports If-Range as well. So, we'll need this to handle that corner case. But yeah, for the servers that support it, this path would be skipped.

use reqwest::header;
use reqwest::header::HeaderValue;
use reqwest::StatusCode;
use reqwest::{header, Response};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fixed to make the formatter happy.

@Bravo555 Bravo555 force-pushed the fix/partial-download-verify-etag branch from 00efec1 to f19a5d0 Compare January 30, 2026 09:25
@Bravo555 Bravo555 temporarily deployed to Test Pull Request January 30, 2026 09:25 — with GitHub Actions Inactive
Comment on lines +202 to +204
let mut response = self.request_range_from(url, request_offset).await?;
let offset = match partial_response::response_range_start(&response, &prev_response)? {
PartialResponse::CompleteContent => 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully get how we are sure the download is making progress, as we restart here from zero in a loop and the backoff retry being inner to this loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We restart from zero only if the server returns 200 OK and we have to download the entire file again. self.request_range_from performs the HTTP request, and is subject to the backoff retry policy, but reading response body and writing it to file happens in save_chunks_to_file_at, which if it completes without errors, breaks out of the loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which if it completes without errors, breaks out of the loop.

That if is my concern. What if we repeatedly fail to consume from the network the last bytes of a file that is served in its entirety each time? Having an infinite retry loop is never safe, even when the likelihood is low.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed you've convinced me that the loop being unbounded is a problem and it should be bounded somehow, but how? Should we just limit the number of iterations the loop can do, such that if we retry too many times we fail, even if theoretically every retry could make a little progress before e.g. timing out? Or should we try to do something smarter, like only count towards the limit requests that haven't made any progress (although it's not clear to me how to precisely define it)?

Ah, this whole thing turns out to be more complicated that initially anticipated, everywhere there is some little edge case. Maybe there are some dependencies that could help with this, or if not I should take a more thorough look at how projects like wget are doing this.

Copy link
Contributor

@didier-wenzek didier-wenzek Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just limit the number of iterations the loop can do

Simple and effective. I don't think we need something smarter as the point is only to avoid insane traps.

If during the download the resource was modified on the server to be
smaller and we've already written more bytes to disk than the new size
of the file, after retrying and overwriting the file with the new
version we could end up with garbage data from the old version of the
file at the end.

To fix this, after completing download, we call `set_len` to discard any
extra bytes that might be present after the cursor when we finish
writing.

Signed-off-by: Marcel Guzik <[email protected]>
@Bravo555 Bravo555 temporarily deployed to Test Pull Request January 30, 2026 14:16 — with GitHub Actions Inactive
@Bravo555
Copy link
Member Author

The issue with leftover bytes from older versions of the downloaded resource was addressed in a7763e4.
Given how this PR is growing with additional edge cases, I could alternatively submit it in a different PR and merge this one as is. Also issue with unbounded loop will have to be addressed, but arguably given this problem was already existing before and this PR didn't introduce it, I'd also prefer to do it in a follow-up PR.

@Bravo555 Bravo555 added this pull request to the merge queue Feb 2, 2026
Merged via the queue into thin-edge:main with commit d827e8f Feb 2, 2026
34 checks passed
@Bravo555 Bravo555 deleted the fix/partial-download-verify-etag branch February 2, 2026 11:49
@reubenmiller reubenmiller added the theme:connectivity Generic connectivity related stuff like HTTP proxy etc. label Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

theme:connectivity Generic connectivity related stuff like HTTP proxy etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants