protocol · jacobheun · Jun 9, 2021 · Mar 19, 2021 · Mar 26, 2021 · Mar 26, 2021
diff --git a/proposals/images/bot-arch.png b/proposals/images/bot-arch.png
diff --git a/proposals/storage-and-retrieval-bots.md b/proposals/storage-and-retrieval-bots.md
@@ -0,0 +1,168 @@
+# Storage and Retrieval Dealbots 
+
+Authors: @mgoelzer
+
+Initial PR: #84
+
+<!--
+This template is for a proposal/brief/pitch for a significant project to be undertaken by a Web3 Dev project team.
+The goal of project proposals is to help us decide which work to take on, which things are more valuable than other things.
+-->
+<!--
+A proposal should contain enough detail for others to understand how this project contributes to our team’s mission of product-market fit
+for our unified stack of protocols, what is included in scope of the project, where to get started if a project team were to take this on,
+and any other information relevant for prioritizing this project against others.
+It does not need to describe the work in much detail. Most technical design and planning would take place after a proposal is adopted.
+Good project scope aims for ~3-5 engineers for 1-3 months (though feel free to suggest larger-scoped projects anyway). 
+Projects do not include regular day-to-day maintenance and improvement work, e.g. on testing, tooling, validation, code clarity, refactors for future capability, etc.
+-->
+<!--
+For ease of discussion in PRs, consider breaking lines after every sentence or long phrase.
+-->
+
+## Purpose &amp; impact 
+#### Background &amp; intent
+In some cases, storage and retrieval deals on Filecoin mainnet fail.  We do not currently have a good handle on how often this happens, what the causes are, whether it is specific to certain miners, whether miners refuse deals intentional or because of software bugs, etc.
+
+The Dealbots proposed here aims to address these problems by randomly selecting miners and making deals with them.  For instance, the pair of bots can make a storage deal and later attempt to retrieve that same data to understand end-to-end reliability on mainnet.  
+
+The Retrieval Bot (r-bot) can also consume a list of {CID,miner} tuples and attempt retrieval on each one.
+
+In all cases, we log the success or failure of each storage or retrieval attempt, along with diagnostic information such as where in the sequence the failure occurred, what the error message was and what the Lotus log tail contained.
+
+#### Assumptions &amp; hypotheses
+_What must be true for this project to matter?_
+
+ - Some storage and retrieval deals fail on mainnet
+ - This is happening for multiple reasons:  code bugs that prevent storage or retrieval from running to successful completion, miners intentionally refusing certain types of deals (certain sizes, or an asymmetry between servicing storage vs retrieval deals).
+ - Understanding the different types of failure and their frequencies will help us find bugs in Lotus.
+ - Understanding the same will help us understand if miner economic incentives are suboptimal.
+ - Providing a tool that can aggregate data across many miners will provide a foundatioin for third parties to run miner reputation systems
+
+#### User workflow example
+
+```
+$ ./dealbot --input path/to/deals/to/try.json
+{
+	"status":"failure",
+	"failedAt":"ClientEventProviderCanceled",  // failure event
+	"eventList":
+		[
+			"Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew)",
+			"Recv: 0 B, Paid 0 FIL, ClientEventDealProposed (DealStatusWaitForAcceptance)".
+			"Recv: 0 B, Paid 0 FIL, ClientEventDealAccepted (DealStatusAccepted)".
+			"Recv: 0 B, Paid 0 FIL, ClientEventPaymentChannelAddingFunds (DealStatusPaymentChannelAllocatingLane)".
+			"Recv: 0 B, Paid 0 FIL, ClientEventLaneAllocated (DealStatusOngoing)".
+			"Recv: 0 B, Paid 0 FIL, ClientEventProviderCancelled (DealStatusCancelling)".
+			"Recv: 0 B, Paid 0 FIL, ClientEventDataTransferError (DealStatusErrored)".
+			"Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew)".
+		],
+	"errorMessage":"ERROR: retrieval failed: Retrieve: Retrieval Error: error generated by data transfer: unable to send cancel to channel FSM: normal shutdown of state machine",
+	"tailLog":"....",            // Multiline, from `tail` of `lotus daemon`
+	"storageDealParameters":      // Given to RetrievalBot as input 
+		{
+			"CID":"Qm...",
+			"sha256":"73cb385...",    // independent checksum of data file
+			"sizeInBytes":"12345678".
+			"minderId":"f01924",
+			"verified":true,
+			"fastRetrievalFlag":true,
+			"dealId":"...",
+		}
+		"lotusVersion":"1.5.3-rc2+mainnet+git.9afb5ff94",
+												               // Call API `Filecoin.Version`
+		"datetime":"YYYY-MM-DD_HH:MM:SS",  // when attempt started
+},
+{
+  // ...next deal attempt json blob...
+}
+```
+
+Stdout will contain the results, in json, of each deal attempt.  It is intended to be piped into a log search service like those provided by AWS/GC.
+
+
+#### Impact
+_How would this directly contribute to web3 dev stack product-market fit?_
+
+ - Improve reliability of the network
+ - Enable an ecosystem of miner reputation and ranking systems
+ - Perform the retrieval verification in Slingshot 2.3
+
+#### Leverage
+_How much would nailing this project improve our knowledge and ability to execute future projects?_
+
+**Immensely!**
+
+ - We don't currently have enough information about why deals fail to allocate our debugging time and resources correctly.
+
+ - Miner reputation systems enabled by this tool would compliment the protocol-level incentives for miners to "do the right thing" (provide reliable retrieval of previously stored data, successfully complete all storage deals, etc)
+
+#### Confidence
+_How sure are we that this impact would be realized? Label from [this scale](https://medium.com/@nimay/inside-product-introduction-to-feature-priority-using-ice-impact-confidence-ease-and-gist-5180434e5b15)_.
+
+C = 8
+
+Nothing is certain, but it is very likely that building this tool will at a minimum enable the Filecoin Project to better understand the frequency and causes of deal failures.  
+
+And the ability of this tool to support miner reputation systems can only help increase deals that get routed to reliable miners.
+
+
+## Project definition
+#### Brief plan of attack
+
+<!--Briefly describe the milestones/steps/work needed for this project-->
+ - **Phase 1:  Retrieval Bot.**  Reads stdin describing a CID to attempt to retrieve, writes outcome of retrieval attempt to stdout.
+ - **Phase 2:  Storage Bot.**  Same idea but for storage deals.
+ - **Phase 3:  Orchestrator.**  Invokes the r-bot and s-bot programs with inputs one-by-one from a long queue of CIDs to test retrieve, or files to test store, etc.
+
+#### What does done look like?
+_What specific deliverables should completed to consider this project done?_
+
+![High level architecture](https://github.com/protocol/web3-dev-team/blob/bots-proposal/proposals/images/bot-arch.png)
+
+####  What does success look like?
+_Success means impact. How will we know we did the right thing?_
+
+ - We have a metrics dashboard (in Observable, Grafana, etc) that continuously shows the most recent deal failures, how frequently they are happening, which miners fail most/least, and similar metrics.  The impact of this should be obvious:  a clearer understanding of why and how often deals are failing on mainnet.
+ - Reputation systems emerge from ecosystem partners that use the data generated by running these bots to rank miners.  This would give Filecoin users reliable, real-time miner ranking, which does not currently exist in the ecosystem.
+
+#### Counterpoints &amp; pre-mortem
+_Why might this project be lower impact than expected? How could this project fail to complete, or fail to be successful?_
+
+ - The metrics fail to give us actionable debugging ideas
+ - Reputation systems develop their own code to capture the same miner statistics (duplication of effort)
+
+#### Alternatives
+_How might this project’s intent be realized in other ways (other than this project proposal)? What other potential solutions can address the same need?_
+
+ - [@whyrusleeping](https://github.com/whyrusleeping/)'s [Estuary](https://github.com/whyrusleeping/estuary) tool
+
+#### Dependencies/prerequisites
+<!--List any other projects that are dependencies/prerequisites for this project that is being pitched.-->
+
+ - [filecoin-project/lotus/pull/5833/
+](https://github.com/filecoin-project/lotus/pull/5833/)
+
+#### Future opportunities
+<!--What future projects/opportunities could this project enable?-->
+
+ - Miner reputation systems as discussed
+
+## Required resources
+
+#### Effort estimate
+<!--T-shirt size rating of the size of the project. If the project might require external collaborators/teams, please note in the roles/skills section below). 
+For a team of 3-5 people with the appropriate skills:
+- Small, 1-2 weeks
+- Medium, 3-5 weeks
+- Large, 6-10 weeks
+- XLarge, >10 weeks
+Describe any choices and uncertainty in this scope estimate. (E.g. Uncertainty in the scope until design work is complete, low uncertainty in execution thereafter.)
+-->
+
+TBD with team
+
+#### Roles / skills needed
+<!--Describe the knowledge/skill-sets and team that are needed for this project (e.g. PM, docs, protocol or library expertise, design expertise, etc.). If this project could be externalized to the community or a team outside PL's direct employment, please note that here.-->
+
+TBD with team