You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: BEYOND_BITSWAP/README.md
+27-6Lines changed: 27 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,25 +31,46 @@ We are making all our contributions, ideas, testbed, benchmarking and analysis s
31
31
### Documents
32
32
33
33
*[Related Work](https://docs.google.com/document/d/14AE8OJvSpkhguq2k1Gfc9h0JvorvLgOUSVrj3CnOkQk/edit#heading=h.nxkc23tlbqhl): It gives an overview of the problem, how it will be tackled, and a collection of references and community proposals.
34
-
*[Beyond Bitswap Slides](https://docs.google.com/presentation/d/18_aRTye2t6Xs_VhKwEbhvCYYu9ePaLgamIrJkpUDtfY/edit?usp=sharing): Set of slides introducing the project and summarizing the Related Work document from above.
35
-
<!-- These slides were used to introduce the project in the following [talk](???). -->
34
+
*[Beyond Bitswap Slides](https://docs.google.com/presentation/d/18_aRTye2t6Xs_VhKwEbhvCYYu9ePaLgamIrJkpUDtfY/edit?usp=sharing): Set of slides introducing the project and summarizing the Related Work document from above. <!-- These slides were used to introduce the project in the following [talk](???). -->
36
35
*[Survey of the state of the art](https://docs.google.com/document/d/172q0EQFPDrVrWGt5TiEj2MToTXIor4mP1gCuKv4re5I/edit?usp=sharing): It summarizes a list of papers on file-sharing strategies in P2P networks used as a groundwork for the projects.
37
36
*[Evaluation Plan](https://docs.google.com/document/d/1LYs3WDCwpkrBdfrnB_LE0xsxdMCIhXdCchIkbzZc8OE/edit#heading=h.nxkc23tlbqhl): Document describing the testbed and evaluation plan designed to test the performane of current implementation of file-sharing systems, and compare it with the improvements implemented within the scope of this work.
38
-
*[Enhancements RFC](#enhancements-rfcs): A list of enhancements proposals and ideas to improve file-sharing in IPFS and P2P networks.
39
-
<!-- * [Test Results](https://docs.google.com/document/d/1zPpgnr9ykJr5PAvShJBGhKKRDRbsglb00MPc5eVEU4Q/edit#): This document collects the results of the tests performed in the scope of the project. -->
37
+
*[Enhancements RFC](#enhancements-rfcs): A list of enhancements proposals and ideas to improve file-sharing in IPFS and P2P networks.<!-- * [Test Results](https://docs.google.com/document/d/1zPpgnr9ykJr5PAvShJBGhKKRDRbsglb00MPc5eVEU4Q/edit#): This document collects the results of the tests performed in the scope of the project. -->
40
38
41
39
### Enhancement RFCs
42
40
43
41
This section shares a list of improvement RFCs that are being currently tackled, discussed and prototyped. Each RFC aims to test a specific idea or assumption, and they may initially be implemented over Bitswap, but that doesn't mean the conclusions drawn are exclusively applicable to the Bitswap protocol. RFCs are divided in the different layers for file-sharing in P2P sytems identified in the [Related Work](https://docs.google.com/document/d/14AE8OJvSpkhguq2k1Gfc9h0JvorvLgOUSVrj3CnOkQk/edit#heading=h.nxkc23tlbqhl).
44
42
43
+
If you want to familiarize with our work, we highly recommend exploring first the RFCs in `prototype` state, and then move to the ones at a `draft` or `brainstorm` state. `prototyped` RFCs are in a stage where there is working prototype you can start evaluating and playing with. The `draft` state means that the RFC is ready for implementation, while `brainstorm` RFCs require further discussions and design work.
**Layer 1 RFCs: Discovery and announcement of content:**
46
63
*[RFC|BB|L1-04: Track WANT messages for future queries](./RFC/rfcBBL104.md): Evaluates how using information from a nodes surrounding can help the discovery and fetching of popular content in the network.
47
-
*[RFC|BB|L1-02: TTLs for rebroadcasting WANT messages](./RFC/rfcBBL102.md): It evaluates how broadcasting exchange requests TTL hops away may help the discovery of content improving performance.
64
+
*[RFC|BB|L1-02: TTLs for rebroadcasting WANT messages](./RFC/rfcBBL102.md): It evaluates how broadcasting exchange requests TTL hops away, and allowing other nodes to discover and retrieve content on behalf of other peers, may help the discovery of content improving performance.
65
+
*[RFC|BB|L1/2-05: Use of super nodes and decentralized trackers](./RFC/rfcBBL1205.md): Aknowledge the fact that P2P networks are also social networks and there are different types of nodes in the network. Explore the use of side-channel discovery mechanisms.
66
+
*[RFC|BB|L1-06: Content Anchors](https://github.com/protocol/ResNetLab/issues/6): Evaluate the use of gossipsub to perform more efficient content routing.
48
67
49
68
**Layer 2 RFCs: Negotiation and transmission of content:**
50
-
*[RFC|BB|L2-07: Request minimum piece size and content protocol extension](./RFC/rfcBBL207.md): Evaluates how the size of the chunks that comprises content requested in a P2P network may affect performance.
51
69
*[RFC|BB|L12-01: Bitswap/Graphsync exchange messages extension and transmission choice](./RFC/rfcBBL1201.md): Proposes dividing the exchange of content in two phases: a negotiation phase used to discover the holders of the different chunks of a file, and a transfer file to explicitly request blocks from different chunk holders. This opens the door to additional exchange strategies and schemes to improve performance.
52
70
*[RFC|BB|L2-03A: Use of compression and adjustable block size](./RFC/rfcBBL203A.md): Evaluates the potential performance improvementes on the use of compression for the exchange of content in P2P networks.
71
+
*[RFC|BB|L2-03B: se of network coding and erasure codes](./RFC/rfcBBL203B.md): Evaluates the potential performance improvementes on the use of network coding and erasure codes to leverage the transmission of content from multiple streams.
72
+
*[RFC|BB|L2-07: Request minimum piece size and content protocol extension](./RFC/rfcBBL207.md): Evaluates how the size of the chunks that comprises content requested in a P2P network may affect performance.
73
+
*[RFC|BB|L2-08: Delegate download to other nodes (bandwidth aggregation)](./RFC/rfcBBL208.md): Leverage the resources of other peer "friends" to collaboratively discover and retrieve content, and perform faster content retrievals.
53
74
54
75
Feel free to jump into the discussions around the project or to propose your own RFC opening an issue in the repo.
<!-- Full description here: https://docs.google.com/document/d/1zjJCZel8zJzgK3XuHK0YZlNffEHThq7tUOssGgRTryY/edit#heading=h.6qnrq913vou6 -->
6
+
7
+
Every time Bitswap receives a new block, [it generates the CID from the payload of the block](https://github.com/adlrocha/go-bitswap/blob/fad1a007cf9bc4f7e8e3f182a4645df60a88a9c6/message/message.go#L222) in order to verify that it belongs to a block it has in its wantlists. This means computing a lot of hash functions. This may involve a significant overhead.
8
+
9
+
## Description
10
+
Exploring more efficient implementation of hash functions, or alternative hash algorithms to fit different hardware architectures could remove an important overhead for Bitswap (and other modules from the IPFS ecosystem).
11
+
12
+
## Implementation plan
13
+
-[ ] Evaluate the overhead of hashing every block in Bitswap. This can be done by exchaching a large file and precompute the CIDs so computing the CID for every block is not needed.
14
+
-[ ] If we see that the overhead from hashing every block is significant, explore other hash functions and make a Bitswap implementation able to support other hash algorithms. Perform the same evalution from above and check the difference in the overhead.
15
+
16
+
# Impact
17
+
- Reduction in the Bitswap protocol overhead. The protocol runs faster.
18
+
19
+
## Evaluation Plan
20
+
21
+
-[The IPFS File Transfer benchmarks.](https://docs.google.com/document/d/1LYs3WDCwpkrBdfrnB_LE0xsxdMCIhXdCchIkbzZc8OE/edit#heading=h.nxkc23tlbqhl)
22
+
23
+
- Measurement of the overhead for different file exchanges for different hash algorithms.
# RFC|BB|L1/2-05: Use of super nodes and decentralized trackers
2
+
* Status: `brainstorm`
3
+
4
+
### Abstract
5
+
6
+
This RFC proposes the classification of nodes in different types according to their capabilities, and the use of side-channel information to track and discover content in the network. We propose the use of decentralized trackers (with good knowledge of where content is stored in the network and a discovery service for "magnet links"), and supernodes (nodes with high bandwidth and low latency which can significantly improve the transmission of content). Thus, nodes can follow different strategies to speed-up the discovery and transmission by "looking-up" content in decentralized trackers and delegating the download of content to near supernodes.
7
+
8
+
This RFC will leverage the "high-quality" infrastructure deployed by entities such as Pinata, Infura or PL. We need to acknowledge the existence of this "high-class" nodes and leverage them to improve the performance of the network.
9
+
10
+
### Description
11
+
12
+
Introduce in the network the concept of supernodes and decentralize trackers.
13
+
14
+
- Supernodes are nodes with high bandwidth, low latency and a good knowledge of where to discover content in the network. Regular nodes would prioritize connection to super nodes as they will speed their file-sharing process. This could be seen as "decentralized gateways" in the network.
15
+
16
+
- Decentralized trackers: Similar concept to the one of the "Hydra Boost". These nodes are passive nodes responsible for random walking the network for content and listening to WANT messages or any other additional announcement of metadata exchange devised for content discovery.
17
+
18
+
Nodes would point decentralize trackers to speed their content discovery and supernodes (if one of them end up being the provider of the content) to increase the transmission.
19
+
20
+
We could envision the use of side channel identifiers for content discovery, equivalent to "magnet links", which instead of pointing to the specific content, it points to the decentralized tracker that can serve your request better. These mangent links should be "alive" and update with the status of the network. Thus, we could have:
21
+
22
+
-`/ipfs/<cid>` identifiers directly pointing to content.
23
+
24
+
-`/iptrack/<tracker_id>`: Points to the tracker that may node where to find the content.
25
+
26
+
- Additionally, the tracker could answer with `[/p2p/Qm.., /p2p/Qm..]` with a list of supernodes that would lead to a faster download of the content.
27
+
28
+
### Prior Work
29
+
30
+
This is similar or can be linked to the [RFC: Side Channels aka DHT-free Content Resolution from this document.](https://docs.google.com/document/d/1QKso-VwYv9jLxTN7WP_RAArrOLCZwjqdjBKQA2wa3VY/edit#)
31
+
32
+
This paper: [2Fast: Collaborative downloads in P2P networks](http://www.st.ewi.tudelft.nl/iosup/2fast06ieeep2p.pdf) proposes the idea of delegating the download of content to a group of nodes. We could consider the implementation of a "grouping scheme" for supernodes in which a node can request a group of supernodes to help him download content. This same grouping strategy could be considered for plain nodes as an independent RFC (combination of ideas presented in [RFCBBL207](./rfcBBL207) and [RFCBBL208](./rfcBBL208)).
33
+
34
+
### Implementation Plan
35
+
36
+
-[ ] Implementation of super-nodes and the download delegation protocol.
37
+
38
+
-[ ] Implementation of decentralized trackers and magnet links protocol.
39
+
40
+
-[ ] Evaluation of different discovery and transmission strategies using this network hierarchy.
41
+
42
+
-[ ] Group of supernodes strategy.
43
+
44
+
### Evaluation Plan
45
+
46
+
-[The IPFS File Transfer benchmarks.](https://docs.google.com/document/d/1LYs3WDCwpkrBdfrnB_LE0xsxdMCIhXdCchIkbzZc8OE/edit#heading=h.nxkc23tlbqhl)
# RFC|BB|L2-03B: Use of network coding and erasure codes.
2
+
* Status: `Brainstorm`
3
+
4
+
### Abstract
5
+
6
+
This RFC proposes the exploration of applying network coding and erasure codes to the content exchanged by peers. These techniques go from:
7
+
- The use of erasure codes in the transmission of blocks so they can be requested from different sources, and the original content can be regenerated even without the reception of all the blocks.
8
+
- The use of rateless codes to make all blocks for a specific content equally valuable.
9
+
- The use of erasure codes for storage (such as Reed Solomon).
10
+
11
+
These techniques could lead to additional improvements by including a negotiation phase in the exchange interface (see [RFC|BB|L1/2-01](./rfcBBL1201)).
12
+
13
+
### Shortcomings
14
+
15
+
In order to recover the content requested, peers need to receive every block of the content's DAG. This means that if just a single block is lost, is too rare, or it is not in the network anymore, it can lead to increased transmission times or in the worst case making the content "unretrievable". The use of erasure coding and network coding can benefit the discovery and transmission of blocks (especially if they are rare), making the content exchange more resilient to unforeseen events. These techniques also improve the transmission of content from several sources.
16
+
17
+
This RFC becomes really interesting in networks with high churn and large files. The aim is to parallelize the transmission from different sources.
18
+
19
+
### Description
20
+
21
+
Several nodes may receive complementary WANT messages from different connected peers. Instead of requesting the content from just one source, or explicitly requesting it from all of them potentially producing duplicates in the network, we could benefit from the use of network coding to enhance the transmission from the multiple sources.
22
+
23
+
We can really benefit from the fact that more than one peer may store the content exploring the use of techniques such as:
24
+
25
+
- The use of erasure codes and network coding in the transmission of blocks so they can be requested from different sources and the original content can be regenerated even without the reception of all the blocks. Peers can send a linear combination of coded blocks so that the requestor is able to recover the content even if it doesn't receive all the original blocks. This can lead to improvements in transmission and the removal of duplicates in the network (the redundancy and linear combination used in block transmission can be related to the amount of duplicates and the split factor used by sessions).
26
+
27
+
- The use of rateless codes to make all blocks for a specific content equally valuable. If several sources serve the content coded using rateless code, every block is equally valuable, and as long as a minimum number of them are received, the content can be recovered.
28
+
29
+
- The use of erasure codes for storage (such as Reed Solomon). It adds a storage overhead but allows to regenerate the original content even if all the blocks are not retrieved. The proposal is to store blocks using their original CID (so their identifier doesn't change) but use Reed Solomon to code the content. This would increase the size of blocks, and poses several limitation on the codes to use to generate the Reed Solomon redundancy.
30
+
31
+
Using the aforementioned techniques, several seeders fulfilling the request for content would be able to encode blocks and stream them so peers can receive blocks from different sources and reconstruct the original content once a minimum number of blocks have been received. This is a good way of parallelizing the transmission of blocks from different sources before [RFC|BB|L1/2-01](./rfcBBL1201). A problem to be solved to implement this RFC is how to orchestrate peer serving the request (the linear coding applied to the content needs to be deterministic). With RFC | BB | L1/2-01 more complex requests for blocks could be performed.
- Rateless coding. Check [this document](https://docs.google.com/document/d/1PdfuPZs5ti7u67R9p4lZl_JFBzk477CjmruiWbLQr4U/edit#heading=h.lrqjoh4tz0t6) and [Petar's paper](http://www.scs.stanford.edu/~dm/home/papers/maymounkov:rateless.pdf) for inspiration.
40
+
41
+
- HackFS project on [Reed Solomon over IPFS](https://github.com/Wondertan/go-ipfs-recovery).
42
+
43
+
### Implementation Plan
44
+
45
+
-[ ] Evaluate potential improvements and overhead of using [IPFS Recovery](https://github.com/Wondertan/go-ipfs-recovery).
46
+
47
+
-[ ] Evaluate the use of rateless coding (or alternatives not IP protected). With rateless codes we can generate check blocks from the content desired and requested from different nodes so that as long as we receive a minimum number of them we can generate the original information. This could potentially remove duplicates blocks.
48
+
49
+
-[ ] If [RFC|BB|L1/2-01](./rfcBBL1201) ends up being implemented, more complex ideas could be evaluated at this end. Discovery and transmission would be two distinct stages, so nodes could eagerly request a compressed or networked coded transmission from a set of nodes.
50
+
51
+
### Impact
52
+
53
+
Improved transmission leveraging multiple streams, more reliable exchanges, and potential removal of duplicates in the network.
54
+
55
+
### Evaluation Plan
56
+
57
+
-[The IPFS File Transfer benchmarks.](https://docs.google.com/document/d/1LYs3WDCwpkrBdfrnB_LE0xsxdMCIhXdCchIkbzZc8OE/edit#heading=h.nxkc23tlbqhl)
58
+
59
+
- Test case where there are several seeders with the same content and leechers are connected to several of them.
60
+
61
+
### Future Work
62
+
63
+
If the negotiation phase from [RFC|BB|L1/2-01](./rfcBBL1201) is implemented, additional communications between seeders and leechers could be performed to enhance the use of these techniques. Thus, if a peer receives an overlapping level of fulfilment for its request from different sources, it can trigger the use of network coding and rateless codes so that, with a minimum number of blocks from both of the sources, the requested content can be retrieved.
64
+
65
+
Additionally, the use of "in-path" coding could be devised as future work, where intermediate nodes in a path upon the reception of several blocks for which fulfill the same request from different sources combine them to enhance the transmission (this requires further exploration). The impact of this improvement would significantly benefit [RFC|BB|L102](./rfcBBL102), where nodes can trigger relay session to request blocks on behalf of other nodes.
0 commit comments