-
Notifications
You must be signed in to change notification settings - Fork 5
Dealbot #84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Dealbot #84
Changes from 1 commit
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,168 @@ | ||
# Storage and Retrieval Dealbots | ||
|
||
Authors: @mgoelzer | ||
|
||
Initial PR: #84 | ||
|
||
<!-- | ||
This template is for a proposal/brief/pitch for a significant project to be undertaken by a Web3 Dev project team. | ||
The goal of project proposals is to help us decide which work to take on, which things are more valuable than other things. | ||
--> | ||
<!-- | ||
A proposal should contain enough detail for others to understand how this project contributes to our team’s mission of product-market fit | ||
for our unified stack of protocols, what is included in scope of the project, where to get started if a project team were to take this on, | ||
and any other information relevant for prioritizing this project against others. | ||
It does not need to describe the work in much detail. Most technical design and planning would take place after a proposal is adopted. | ||
Good project scope aims for ~3-5 engineers for 1-3 months (though feel free to suggest larger-scoped projects anyway). | ||
Projects do not include regular day-to-day maintenance and improvement work, e.g. on testing, tooling, validation, code clarity, refactors for future capability, etc. | ||
--> | ||
<!-- | ||
For ease of discussion in PRs, consider breaking lines after every sentence or long phrase. | ||
--> | ||
|
||
## Purpose & impact | ||
#### Background & intent | ||
In some cases, storage and retrieval deals on Filecoin mainnet fail. We do not currently have a good handle on how often this happens, what the causes are, whether it is specific to certain miners, whether miners refuse deals intentional or because of software bugs, etc. | ||
|
||
The Dealbots proposed here aims to address these problems by randomly selecting miners and making deals with them. For instance, the pair of bots can make a storage deal and later attempt to retrieve that same data to understand end-to-end reliability on mainnet. | ||
|
||
The Retrieval Bot (r-bot) can also consume a list of {CID,miner} tuples and attempt retrieval on each one. | ||
|
||
In all cases, we log the success or failure of each storage or retrieval attempt, along with diagnostic information such as where in the sequence the failure occurred, what the error message was and what the Lotus log tail contained. | ||
|
||
#### Assumptions & hypotheses | ||
_What must be true for this project to matter?_ | ||
|
||
- Some storage and retrieval deals fail on mainnet | ||
- This is happening for multiple reasons: code bugs that prevent storage or retrieval from running to successful completion, miners intentionally refusing certain types of deals (certain sizes, or an asymmetry between servicing storage vs retrieval deals). | ||
- Understanding the different types of failure and their frequencies will help us find bugs in Lotus. | ||
- Understanding the same will help us understand if miner economic incentives are suboptimal. | ||
- Providing a tool that can aggregate data across many miners will provide a foundatioin for third parties to run miner reputation systems | ||
|
||
#### User workflow example | ||
|
||
``` | ||
$ ./dealbot --input path/to/deals/to/try.json | ||
{ | ||
"status":"failure", | ||
"failedAt":"ClientEventProviderCanceled", // failure event | ||
"eventList": | ||
[ | ||
"Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew)", | ||
"Recv: 0 B, Paid 0 FIL, ClientEventDealProposed (DealStatusWaitForAcceptance)". | ||
"Recv: 0 B, Paid 0 FIL, ClientEventDealAccepted (DealStatusAccepted)". | ||
"Recv: 0 B, Paid 0 FIL, ClientEventPaymentChannelAddingFunds (DealStatusPaymentChannelAllocatingLane)". | ||
"Recv: 0 B, Paid 0 FIL, ClientEventLaneAllocated (DealStatusOngoing)". | ||
"Recv: 0 B, Paid 0 FIL, ClientEventProviderCancelled (DealStatusCancelling)". | ||
"Recv: 0 B, Paid 0 FIL, ClientEventDataTransferError (DealStatusErrored)". | ||
"Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew)". | ||
], | ||
"errorMessage":"ERROR: retrieval failed: Retrieve: Retrieval Error: error generated by data transfer: unable to send cancel to channel FSM: normal shutdown of state machine", | ||
"tailLog":"....", // Multiline, from `tail` of `lotus daemon` | ||
"storageDealParameters": // Given to RetrievalBot as input | ||
{ | ||
"CID":"Qm...", | ||
"sha256":"73cb385...", // independent checksum of data file | ||
"sizeInBytes":"12345678". | ||
"minderId":"f01924", | ||
"verified":true, | ||
"fastRetrievalFlag":true, | ||
"dealId":"...", | ||
} | ||
"lotusVersion":"1.5.3-rc2+mainnet+git.9afb5ff94", | ||
// Call API `Filecoin.Version` | ||
"datetime":"YYYY-MM-DD_HH:MM:SS", // when attempt started | ||
}, | ||
{ | ||
// ...next deal attempt json blob... | ||
} | ||
``` | ||
|
||
Stdout will contain the results, in json, of each deal attempt. It is intended to be piped into a log search service like those provided by AWS/GC. | ||
|
||
|
||
#### Impact | ||
_How would this directly contribute to web3 dev stack product-market fit?_ | ||
|
||
- Improve reliability of the network | ||
- Enable an ecosystem of miner reputation and ranking systems | ||
- Perform the retrieval verification in Slingshot 2.3 | ||
|
||
#### Leverage | ||
_How much would nailing this project improve our knowledge and ability to execute future projects?_ | ||
|
||
**Immensely!** | ||
|
||
- We don't currently have enough information about why deals fail to allocate our debugging time and resources correctly. | ||
|
||
- Miner reputation systems enabled by this tool would compliment the protocol-level incentives for miners to "do the right thing" (provide reliable retrieval of previously stored data, successfully complete all storage deals, etc) | ||
|
||
#### Confidence | ||
_How sure are we that this impact would be realized? Label from [this scale](https://medium.com/@nimay/inside-product-introduction-to-feature-priority-using-ice-impact-confidence-ease-and-gist-5180434e5b15)_. | ||
|
||
C = 8 | ||
|
||
Nothing is certain, but it is very likely that building this tool will at a minimum enable the Filecoin Project to better understand the frequency and causes of deal failures. | ||
|
||
And the ability of this tool to support miner reputation systems can only help increase deals that get routed to reliable miners. | ||
|
||
|
||
## Project definition | ||
#### Brief plan of attack | ||
|
||
<!--Briefly describe the milestones/steps/work needed for this project--> | ||
- **Phase 1: Retrieval Bot.** Reads stdin describing a CID to attempt to retrieve, writes outcome of retrieval attempt to stdout. | ||
- **Phase 2: Storage Bot.** Same idea but for storage deals. | ||
- **Phase 3: Orchestrator.** Invokes the r-bot and s-bot programs with inputs one-by-one from a long queue of CIDs to test retrieve, or files to test store, etc. | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
|
||
#### What does done look like? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we do expect to be running this code for some time, and that is comes with both an infra and code burden. Are there other criteria or thoughts that the stewards have for thinking about what they'd like to see before the project team moves on? cc @BigLep |
||
_What specific deliverables should completed to consider this project done?_ | ||
|
||
 | ||
|
||
#### What does success look like? | ||
_Success means impact. How will we know we did the right thing?_ | ||
|
||
- We have a metrics dashboard (in Observable, Grafana, etc) that continuously shows the most recent deal failures, how frequently they are happening, which miners fail most/least, and similar metrics. The impact of this should be obvious: a clearer understanding of why and how often deals are failing on mainnet. | ||
- Reputation systems emerge from ecosystem partners that use the data generated by running these bots to rank miners. This would give Filecoin users reliable, real-time miner ranking, which does not currently exist in the ecosystem. | ||
|
||
#### Counterpoints & pre-mortem | ||
_Why might this project be lower impact than expected? How could this project fail to complete, or fail to be successful?_ | ||
|
||
- The metrics fail to give us actionable debugging ideas | ||
- Reputation systems develop their own code to capture the same miner statistics (duplication of effort) | ||
|
||
#### Alternatives | ||
_How might this project’s intent be realized in other ways (other than this project proposal)? What other potential solutions can address the same need?_ | ||
|
||
- [@whyrusleeping](https://github.com/whyrusleeping/)'s [Estuary](https://github.com/whyrusleeping/estuary) tool | ||
|
||
#### Dependencies/prerequisites | ||
<!--List any other projects that are dependencies/prerequisites for this project that is being pitched.--> | ||
|
||
- [filecoin-project/lotus/pull/5833/ | ||
](https://github.com/filecoin-project/lotus/pull/5833/) | ||
|
||
#### Future opportunities | ||
<!--What future projects/opportunities could this project enable?--> | ||
|
||
- Miner reputation systems as discussed | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
|
||
## Required resources | ||
|
||
#### Effort estimate | ||
<!--T-shirt size rating of the size of the project. If the project might require external collaborators/teams, please note in the roles/skills section below). | ||
For a team of 3-5 people with the appropriate skills: | ||
- Small, 1-2 weeks | ||
- Medium, 3-5 weeks | ||
- Large, 6-10 weeks | ||
- XLarge, >10 weeks | ||
Describe any choices and uncertainty in this scope estimate. (E.g. Uncertainty in the scope until design work is complete, low uncertainty in execution thereafter.) | ||
--> | ||
|
||
TBD with team | ||
|
||
#### Roles / skills needed | ||
<!--Describe the knowledge/skill-sets and team that are needed for this project (e.g. PM, docs, protocol or library expertise, design expertise, etc.). If this project could be externalized to the community or a team outside PL's direct employment, please note that here.--> | ||
|
||
TBD with team |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ℹ️ Noting that this project doesn't list
With these, we should expect this project to be at least a 'medium' or to take at least 4 weeks of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm 👍 on tests and docs.
For ongoing monitoring, the dashboards project might inadvertently solve for that. If the bots start failing, it will probably be immediately apparent to viewers of the dashboards (e.g., all metrics suddenly go to zero).