Skip to content

Conversation

@sadhansood
Copy link
Contributor

@sadhansood sadhansood commented Dec 3, 2025

Description

This PR adds opportunistic uploads for small blobs. The new workflow is as follows:

  1. Kick off sliver uploads with pending intent in parallel with blob registration.
  2. tokio::select! races registration against the uplaod task with pending intent. When registration wins, cancel the pending task.
  3. Remember this from the pending upload task: which nodes succeeded (and their total weight) and which committee nodes didn’t get slivers.
  4. Run the immediate upload seeded with that state: only target the remaining nodes and start with the previously accumulated weight when checking quorum.
  5. Collect confirmations. If quorum is still short (e.g., buffered slivers were evicted), do one more immediate upload pass to the nodes missing confirmations, then try to build the certificate again.

Test plan

All existing tests are enabled to use this feature. Disabled in prod by default.

@sadhansood sadhansood force-pushed the sadhan/opportunistic_upload branch from 8dabf10 to fb059ae Compare December 3, 2025 22:36
@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

Warning: This PR modifies the Walrus CLI. Please consider the following:

  • Make sure the changes are backwards compatible. Consider deprecating options before
    removing them.
  • Generally only use long CLI options, not short ones to avoid conflicts in the
    future.
  • If you added new options or features, or modified the behavior, please document the
    changes in the release notes of the PR and update the documentation in the docs/book
    directory.

@sadhansood sadhansood requested review from halfprice and wbbradley and removed request for halfprice December 3, 2025 22:37
Copy link
Contributor

@mlegner mlegner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for implementing this, @sadhansood! 🙏

The upload code is now getting more and more complex and it's getting pretty difficult to understand it completely. Some refactoring and additional docstrings/comments could help with that.

Two general questions:

  1. Have you already checked the performance of this?
  2. Have you investigated how this interacts with existing retry loops?

@sadhansood sadhansood force-pushed the sadhan/cache_client_2_add_upload_intent branch 4 times, most recently from d27ebce to 97bba42 Compare December 10, 2025 21:16
Base automatically changed from sadhan/cache_client_2_add_upload_intent to main December 11, 2025 22:34
@sadhansood sadhansood force-pushed the sadhan/opportunistic_upload branch 2 times, most recently from 8601043 to 41cbc62 Compare December 15, 2025 09:09
@sadhansood sadhansood force-pushed the sadhan/opportunistic_upload branch from 41cbc62 to 115c8a0 Compare December 19, 2025 00:06
Copy link
Contributor

@halfprice halfprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sadhansood for this epic PR! The complexity is impressive.

Here is my current understanding of the upload flow:

  1. in parallel
    • send registration
    • upload data to storage nodes
  2. if registration returns first, pending uploads will be cancelled
  3. the client will do all the uploads again

Is this the case?

Copy link
Contributor

@halfprice halfprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To finish my thoughts after first pass. First of all, great effort implementing this optimization!!! The major comment from me is that, in the current form, the code is a bit too complex to understand. In my mind, the upload flow should be quite simple:

Registration (one call to the SuiClient)
Sending data to nodes (set of tasks sending to storage nodes)
Form certificate and certify on chain (parse results, and one call to the SuiClient)

On top of these, it has committee management, and everything should operate based on one committee view. What your PR does should essentially make first two steps concurrent.

One thing I think made things too complicated, is that reserve_and_store_encoded_blobs tries to do a lot more than just upload blobs. For example, for the first two storage optimization (check status, reuse resource), can very well be a standalone step than run before this function. The additional state need to be handled inside this function due to these optimizations amplify the complexity in multiple degree IMO.

Let's sync offline to discuss the best way moving forward.

@sadhansood
Copy link
Contributor Author

To finish my thoughts after first pass. First of all, great effort implementing this optimization!!! The major comment from me is that, in the current form, the code is a bit too complex to understand. In my mind, the upload flow should be quite simple:

Registration (one call to the SuiClient) Sending data to nodes (set of tasks sending to storage nodes) Form certificate and certify on chain (parse results, and one call to the SuiClient)

On top of these, it has committee management, and everything should operate based on one committee view. What your PR does should essentially make first two steps concurrent.

One thing I think made things too complicated, is that reserve_and_store_encoded_blobs tries to do a lot more than just upload blobs. For example, for the first two storage optimization (check status, reuse resource), can very well be a standalone step than run before this function. The additional state need to be handled inside this function due to these optimizations amplify the complexity in multiple degree IMO.

Let's sync offline to discuss the best way moving forward.

Agreed @halfprice as discussed offline, let's take some time after this PR and do a proper refactor of the client to simplify it a lot more.

Copy link
Contributor

@halfprice halfprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sadhansood for the revision! The new revision makes sense to me. Let's aim to clean up the client upload logic as the next work item here before adding any new functionalities.

@sadhansood sadhansood force-pushed the sadhan/opportunistic_upload branch from a3adba8 to c80f2b9 Compare December 20, 2025 02:00
@sadhansood
Copy link
Contributor Author

Thanks @sadhansood for the revision! The new revision makes sense to me. Let's aim to clean up the client upload logic as the next work item here before adding any new functionalities.

Thank you reviewing this PR @halfprice, I will address your comments before landing.

@sadhansood sadhansood merged commit 3dcf3bb into main Jan 5, 2026
24 checks passed
@sadhansood sadhansood deleted the sadhan/opportunistic_upload branch January 5, 2026 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants