List view
Completing this milestone will require both a checkup on old code and implementing new logic. Crossed out "TODOs" indicate a finished task. The finished task will have the appropriate PR linked. # Definitions - engine/the engine: The Engine struct itself, usually in reference to the `torrent()` function. - peer thread: The thread that a call to `handle_peer()` in the engine is spawned on. # Old Code ## ~~Trackers~~ Finished in #87 Trackers should have been refactored and implemented properly in #84, however it would be appropriate to ensure that this feature of the engine is working and possibly abstract it into a separate function for the sake of testability. ### BEPS: - [BEP 0003](https://www.bittorrent.org/beps/bep_0003.html) (_HTTP_) - [BEP 0015](https://www.bittorrent.org/beps/bep_0015.html) (_UDP_) ## ~~Info Dict~~ Finished in #103 While this feature should already be implemented in the `handle_peer()` function, we should ensure that it is working as intended with multiple different peers. This includes: - Ensuring that the info dict is being sent back to engine - No pieces are requested until full info dict is acquired - Verify that the info dict is collected and correct by hashing the incoming info dict with our predefined hash from the magnet uri. We cannot verify pieces before we have the info dict. Thus, this feature working correctly is extremely important. If we do not have the hashes of each piece, peers could potentially send malformed or malicious pieces without our knowledge. ### BEPS - [BEP 0009](https://www.bittorrent.org/beps/bep_0009.html) (_Allows peers to send metadata files_) - [BEP 0010](https://www.bittorrent.org/beps/bep_0010.html) (_Defines another PeerMessage type_) ## ~~Handshake~~ Verified with `cargo nextest run --no-capture test_torrent_actor | grep uTP` The handshake is poorly documented by BitTorrent, thus we am unaware if the current logic in the handshake is correct. We would like to try a few various peers from different clients to ensure that this is correct. - [BEP 0001](https://www.bittorrent.org/beps/bep_0001.html) (_Defines the general BitTorrent protocol_) ## ~~uTP peers~~ Verified with `cargo nextest run --no-capture test_torrent_actor | grep uTP` Even though we currently _should_ have feature-parity with between uTP and TCP peers, it appears only TCP peers are connecting. We should run a multitude tests to verify that real uTP peers are actually connecting and working. ### BEPS - [BEP 0029](https://www.bittorrent.org/beps/bep_0029.html) # Things to do ## ~~Utilize an actor model for engine~~ Finished in #96 Utilizing an actor model for the engine would allow us to easily handle multiple instances of `Torrent` at the same time, as well as handle a singular instance of `Torrent` failing accordingly. It would also allow us to easily gather statistics and general information from all of the `Torrent` instances. ## ~~Utilize an actor model for peers and trackers~~ Finished -- #94 The current implementation of the Peer struct essentially acts as an Actor, and there are many other use cases for an Actor-like model in this project. Using a crate for an Actor model would help to standardize libtortillas. **This should be done before anything else in this milestone.** We have a few choices of what actor crate to utilize. A helpful blog post can be found [here](https://tqwewe.com/blog/comparing-rust-actor-libraries/). ## ~~Receive and handle bitfield and rarest piece~~ This is not being implemented at the moment. If you would like to implement this, please see #117 and #118 At the moment, the engine does not handle received bitfields appropriately. Our proposed solution is to utilize a hashmap to maintain the rarest piece for a given torrent. That hashmap will be updated when a peer sends a bitfield, and the general process will look something like this: - Peer sends bitfield - We send a `bitxor()` version of the bitfield, using old bitfield and new bitfield as inputs, back to the engine. The `bitxor()` function will act as expected because there is no rational scenario where a peer will "lose" a piece (AKA remove a piece from its bitfield) - We store a `bitand_assign()` version of the bitfield in the `handle_peer()` function. This function is running until either the torrent function ends or the peer disconnects. - Repeat for new bitfields With the xor'ed version of the bitfield being sent to the engine, we can use that to update and increment a hashmap accordingly (we may want to utilize [Dashmap](https://crates.io/crates/dashMap)). For every piece that the peer "has" (every piece that is in the bitfield), we will increment that part of the hashmap. However, we also be able to simply use a vector for this. As for the rarest piece, getting this could be costly seeing that there may be a loose upper bound of 10 thousand pieces. In other words, we generally expect most torrents to have a maximum of _about_ 10 thousand pieces. If that is the case, getting the minimum value of a hashmap or vector that is 10 thousand bytes long could be extremely costly, especially if we have to constantly recalculate this value. There are a few different ways to handle this: 1. Cache the rarest piece every x seconds. If the cache has the rarest piece, don't recalculate the rarest piece. If it doesn't, do recalculate. 2. Every x bitfields we receive, recalculate the rarest piece. Both methods have their pros and cons, and we will make the choice of what to implement somewhere in the near future. ### BEPS - [BEP 0001](https://www.bittorrent.org/beps/bep_0001.html) (_Defines the general BitTorrent protocol_) ## Handle piece and request flow Request pieces, rarest piece first. The primary challenge here is actually storing the piece on disk, specifically _not_ in memory. That would be a major problem for a number of reasons. The means of storing a given piece on disk is outlined in the next section. This should be extensible enough that selecting a piece based on the `Allowed Fast` attribute of an Fast Extension (?) message could be implemented in the future (see [BEP 0006](https://bittorrent.org/beps/bep_0006.html)) A bitvec should be maintained that tells us which pieces we already have, for the sake of seeding. Seeding is outlined a few sections down. ### BEPS - [BEP 0001](https://www.bittorrent.org/beps/bep_0001.html) (_Defines message types_) - [BEP 0006](https://bittorrent.org/beps/bep_0006.html) ## Creating, appending, and closing a file. Currently, the general flow of this logic should go something like this: - Open a file with a certain length, determined from the info dictionary, with [set_len()](https://doc.rust-lang.org/std/fs/struct.File.html#method.set_len) - Append & move around that blank file with the [Seek](https://doc.rust-lang.org/beta/std/io/trait.Seek.html) trait. - Save and close the file ### BEPS None ## Handling peer disconnects If a peer disconnects, we need to know what pieces to remove from the hashmap so that we don't miscalculate the rarest piece. Peer disconnects should be handled appropriately and the current bitfield of the peer should be sent to the engine on disconnect in a special `PeerResponse` enum variant designated for peer disconnects. ### BEPS None ## Handling torrents with multiple files There are two main challenges in handling a torrent with multiple files. Firstly is the multiple _entirely separate_ files themselves. The files are provided in sequential order in the torrent file. If we are torrenting off of a Magnet URI, we will not know what files there are until we get the info dict via BEP 0009/BEP 0010. Managing separate files is a challenge due to the fact that, as far as we are aware (according to [BitTorrent Theory](https://wiki.theory.org/BitTorrentSpecification#Notes)), the end of one file in a `Piece` message is the same as the start of another file. An example is shown below: _A "diagram" of a piece message around a given byte, X bytes through the piece._ ..., byte X - 1, byte X, byte X + 1, byte X + 2, ... Byte X may mark the end of a file, and byte X + 1 may mark the start of a new file. Below is a "pseudocode" example of this process in Python. - Figure out where each file ends - Figure out where each file starts - "Split" the file at that point, and carry on with downloading the next file. ```py tmp_piece = [] pieces_seen = 0 for file in files: # Every piece except the "odd one out" (the last piece, where the file ends before the piece ends) num_pieces = file.length // piece_length # The # of bytes for this file in the odd piece remainder_bytes = file.length - (num_pieces * piece_length) # If we have the last part of a piece as the start of a file, "mark" this piece as seen. if len(tmp_piece) > 0: pieces_seen += 1 # Get all pieces including the odd one out pieces = tmp_piece + get_piece_bytes(num_pieces, start_at=pieces_seen, end_at=num_pieces + 1) # The index (the end) of the pieces for the file plus the remainder bytes first_half_index = (num_pieces * piece_length) + remainder_bytes # Appends the portion of the odd piece file.bytes = pieces[0:first_half_index] pieces_seen += num_pieces tmp_piece = pieces[first_half_index:] ``` More suggestions are welcome. The second main issue lies in handling multi file torrents with subfolders. This should be manageable, as once managing separate files is solved, we can simply check to see if the given file should be placed in a subfolder, and if so, create a subfolder if it does not exist. ### BEPS - [BEP 0001](https://www.bittorrent.org/beps/bep_0001.html) (_Defines message types_) - [BEP 0009](https://www.bittorrent.org/beps/bep_0009.html) - [BEP 0010](https://www.bittorrent.org/beps/bep_0010.html) ## Seeding The engine currently has no implementation for seeding. Seeding in itself is simple, however managing the choking algorithm correctly could be tricky. To handle the basics of seeding, the following should be implemented: - `Request` PeerMessages should be passed onto the engine from the peer thread that the message was recieved on. - If we have the piece, we should send the piece back to the peer thread in a `PeerCommand` enum. - The peer should then send the piece back to the peer As for managing the choking algorithm, BEP 0052 should be referenced. ### BEPS - [BEP 0003](https://www.bittorrent.org/beps/bep_0003.html) _Gives an "overview" on seeding_ - [BEP 0052](https://www.bittorrent.org/beps/bep_0052.html) _Defines BitTorrent V2, but also the currently implemented choking algorithm_ # Engine implementation ```rs /// This enum should impl Default struct ListenerPorts(u16, u16); /// Creates a new Engine instance with either no torrents, or specified torrents via metainfo or a torrent file/url. new() -> Engine; from_metainfo(metainfo: &MetaInfo) -> Engine; from_torrent(metainfo_file_or_url: &str) -> Engine; /// Config operations set_output_dir(&mut self, path: impl AsRef<Path>) -> Engine; set_max_peers(&self, max_peers: usize) -> Engine; /// If this function is specified, the engine will still write to the disk set_output_stream_for(torrent: InfoHash, stream: impl Write) -> Engine; /// Adds to an existing engine instance /// Store torrents in a `Dashmap` in self using the info hash as a key add_torrent(&self, metainfo_file_or_url: &str) -> InfoHash; /// Gives the hashmap of all added torrents torrents() -> DashMap<InfoHash, ?>; /// Starts the peer listeners for utp and tcp async fn start_listeners(config: ListenerConfig); fn events() -> broadcast::UnboundedReceiver<Event>; async command(torrent: InfoHash, command: EngineCommand) -> Result<Option<Event>>; ```
No due date•17/17 issues closed