Skip to content

Conversation

@ma2bd
Copy link
Contributor

@ma2bd ma2bd commented Oct 9, 2025

Motivation

Batch downloading missing blobs

Proposal

Backport #4755

Test Plan

CI

@ma2bd ma2bd changed the base branch from main to testnet_conway October 9, 2025 15:39
…4755)

Currently, when we synchronize a chain, even though we receive the
certificates in a batch to `process_certificates`, we handle them one by
one at the local node level, and if a certificate is missing blobs we
stop, download the blobs for that certificate, then retry, making the
download of the blobs sequential. This makes startup time for the client
linear in the number of certificates-with-blobs present in its initial
chains (notably, the admin chain).

We already have an ahead-of-time indicator of which blobs will be
required by the certificates, so there's no need to download them one at
a time. If we don't have some blobs marked as required by the
certificate batch, try to download them (concurrently) before proceeding
to process the batch. ~Thereafter, `BlobsNotFound` when processing the
batch is a hard error.~ `required_blob_ids()` is conservative, so we
still need to download blobs and retry if we get a `BlobsNotFound` error
thereafter.

CI.

- These changes should be backported to the latest `testnet` branch,
then
    - be released in a new SDK.
@ma2bd ma2bd force-pushed the backport_missing_blobs branch from 7713638 to b324eea Compare October 9, 2025 15:41
Copy link
Contributor

@afck afck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I just realized this is wrong. We have to revert (or fix) the original PR instead:

store_blobs only stores blobs if it has already seen proof for them, so process_certificate must be called first, otherwise the blob won't get written.

@afck
Copy link
Contributor

afck commented Oct 9, 2025

We could instead call process_certificate in a loop for all the certificates, and make a note of which one caused the first BlobsNotFound error. Then collect all BlobsNotFound errors, download the blobs in a batch, store_blobs, and then call process_certificate in a loop again, starting from the first height where we got an error.

@ma2bd ma2bd closed this Oct 9, 2025
@afck afck reopened this Oct 9, 2025
@afck
Copy link
Contributor

afck commented Oct 9, 2025

Sorry, @Twey, @ma2bd, I missed that the fallback was already implemented! That fixes it, of course. (Although I think what I'm describing above might be a bit simpler in some cases.)

self.download_blobs(remote_node, blob_ids).await?;
self.handle_certificate(certificate).await?
}
x => x?,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'd prefer error or result.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error would be incorrect. result is a good suggestion, but too late :)

@Twey
Copy link
Contributor

Twey commented Oct 9, 2025

Suggested approach in #4770.

@ma2bd
Copy link
Contributor Author

ma2bd commented Oct 9, 2025

Remote compatibility tests are not passing (twice). We may want to investigate

Comment on lines +371 to +376
&futures::stream::iter(blob_ids.into_iter().map(|blob_id| async move {
remote_node.try_download_blob(blob_id).await.unwrap()
}))
.buffer_unordered(self.options.max_joined_tasks)
.collect::<Vec<_>>()
.await,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we use let .. = for complex expressions in argument position?

Copy link
Contributor

@Twey Twey Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the added mental load of introducing unnecessary variables. It makes it harder to read the code from top to bottom since you don't know how the thing is going to be used until later, compared to having the top-level/motivating call (store_blobs here) first.

It's sometimes worth it if it's really unclear what the value of the expression is without a name (though that's a bit of a code smell for the names of the functions &c. involved in the expressions), but in this case I think it's pretty obvious that this is a set of blobs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(best of both worlds is obviously Haskell-esque

    storeBlobs blobs
  where
    blobs = 

😄)

Copy link
Contributor Author

@ma2bd ma2bd Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our brains don't seem to work in the same way because for me this style hurts a lot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intermediate variable (when named appropriately) acts as a summary of the complex expression.

let tasks = something_complicated(args).await?; // <-- can fail, can yield to another thread
process_tasks(tasks); // <- do the thing

Copy link
Contributor

@Twey Twey Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Values are protocols between a constructor and a consumer. That means there are two information flows that you want to make clear at the point of seeing a value: one is upstream ‘what the value is’; the other is downstream ‘how the value is going to be used’. Usually ‘what the value is’ is evident from ‘how the value is constructed’; sometimes there are also non-obvious invariants that aren't clear from the way the value is constructed, and then IMO it's worth giving the expression a name via variable assignment, named function argument, et cetera, though a better solution (IMO) is to make the constructing variable clearer, e.g. by extracting it into a function whose name or (better) type expresses the invariants.

In your example the problem is that tasks doesn't really tell you what the tasks are for. Are we going to count them? Track their progress? That doesn't just make it harder to see what the function as a whole does, it also might have a bearing on what the tasks themselves actually are above and beyond what's captured in the name: if we're planning to run the tasks then it's implied that they're in a runnable state (not errored, exited, et cetera). In a small two-line example like this you can quickly look at the next line to see how the information is flowing, but as functions get bigger it's easy to lose track, because you've essentially decomposed a tree:

A
| \
B  C
|  |\
D  E F

into a (backwards!) list of edges:

C -> (E, F)
D -> B
A -> (B, C)

(and usually with more noise between them!). Now you have to pay attention to and remember all the names in order to be able to mentally reconstruct the (tree-shaped) information flow of the data, rather than it being inherent in the syntax.

Concretely, in this case, I'd probably call the variable blobs. But that defers the information about why we're interested in having a set of blobs (to store them)! You could call the variable blobs_to_pass_to_store_blobs but that's a little silly to do for everything, and to get the same level of information that's given by having syntactic nesting that reflects the flow of information you need to do this transitively:

let blob_ids_to_filter_to_download_to_store = get_blob_ids();
let filtered_blob_ids_to_download_to_store = filter(blob_ids_to_filter_to_store);
let filtered_downloaded_blobs_to_store = download(filtered_blob_ids_to_download_to_store);
store(filtered_downloaded_blobs_to_store);

In the case where the expression is very opaque, a quick fix can be to give it a descriptive name that describes its value:

let blobs = mysterious_function_that_could_return_anything();

This still obscures the information flow, but it's probably worth it if you don't want to refactor/rename the mysterious function into something more readable (a named function argument gives you the best of both worlds here). But in the case where the expression directly suggests what's being returned, e.g. when you're mapping a function called try_download_blob, declaring that it returns ‘blobs’ is information-free.

There's another argument in your comment about specifically the ? and .await, (I guess) implying that any expression ending with either of those should be assigned to a variable to make them ‘loud, explicit syntax’ as Stroustrup would say. I think this is somewhat a matter of opinion and culture, so I won't try to argue it from first principles, but Rust-the-language has made the decision that these operations are unsurprising enough today to be worthy of the terser syntax after extensive discussion (rather than, say, try or await statements/blocks) and I think idiomatic Rust shouldn't second-guess that.

@deuszx
Copy link
Contributor

deuszx commented Oct 15, 2025

Why is this PR not being merged? RN we have a divergence between conway and main.

@Twey
Copy link
Contributor

Twey commented Oct 16, 2025

The remote compatibility test failure doesn't look related to this at all:

2025-10-09T22:32:32.550373Z ERROR linera: Error is Failed to close chain

Caused by:
    chain client error: Signer doesn't have key to sign for chain 726e27a961a72a7e32064a9bc7a68bbf6f825e07ae8b0282698a0215615bab85

@Twey Twey added backport This PR backports functionality already available on `main`. performance labels Oct 16, 2025
@Twey Twey changed the title Backport fix on missing blobs (#4755) linera_core::client: batch downloading of missing blobs (#4755) Oct 16, 2025
@deuszx
Copy link
Contributor

deuszx commented Oct 20, 2025

Are we going to merge this or not? AFAIU, it's on main anyway.

@ma2bd ma2bd merged commit d9481a4 into linera-io:testnet_conway Oct 20, 2025
59 of 61 checks passed
@ma2bd ma2bd deleted the backport_missing_blobs branch October 20, 2025 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport This PR backports functionality already available on `main`. performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants