-
Notifications
You must be signed in to change notification settings - Fork 8
Description
the implementation for this issue was released as multi-hyperbee
Problem
Hypercore is a single-writer system. In a multi-device scenario (e.g. a mobile and a PC) we need to find a way to present one [virtual] Hypercore across all devices. We should include into our devices also personal Cloud peers to take advantage of their reliability.
Existing approaches in Hypercore community
-
Materialize merged Hypercores into one (Kappa-DB does that). Drawbacks:
1.1. duplication of data (confirm the extend of it?)
1.2. loses sparse access
1.3. loses the authenticity of individual records (confirm?)
1.4. loses performance -
Master feed + deltas. It is developed by @RangerMauve for Multi-Hyperdrive. The algorithm is specific to Hyperdrive, but can be extended to a degree to Hyperbee and Hypercore. Extending it to Hypertrie is easy as Hyperdrive is built in Hypeptrie and has the same access semantics (minus the file contents).
Evaluation of Multi-hyperdrive
peer1: me r2 r3 r2 = hyperdrive(peer2-key, sparse)
peer2: r1 me r3
peer3: r1 r2 me
How it works
- Each peer creates it’s own hyperdrive and replicate sparsely peers’ hyperdrives
- Each peer sets top-level watches on replicas
- Upon file change on other peer, a notification is pushed to watching peers
- Received notification has only metadata of the changed file
- Watcher decides what to do with this info. The data must be fetched from remote peer IF we want to uphold offline-first. Alternatively, if we are willing to sacrifice offline-first somewhat, we get fetch the data from remote peer on-demand, when file is requested locally.
- The next request for the fetched file will look through its own hyperdrive and all the replicas and return the latest version of the file. Same with readdir, but with a bit more work.
Pro
- Replication model: Leaderless multi-master, all peers are equal
- Offline-first: each peer has full set of data locally, if it does pre-fetch on watch()
- Storage: no duplication of data for new files created on different peers
- Network. Fan-out on metadata change, but largely insignificant
- Backup is straightforward, as multi-hyperdrive automated walk-through data changes stored in different feeds
Con
- Network: painful fan-out on a data change. If pre-fetch on watch event is used it number of peers in the swarm * data-size. If pre-fetch is not done, the file still can be requested on more than one peer, and it will be uploaded more then once. Besides, the mobile may not be online to provide the data (waking up mobile app in background mode is possible but it has limitations).
- Storage: A file edited on a peer duplicates the storage for this file on every peer. But it is the same cost as in a single-writer, since Hyperdrive today does yet not support block-level file modification. When Hyperdrive fixes this issue, multi-hyperdrive design will need to be adjusted.
- Performance: N file stats per get(), probably not significant
- Consistency: can end up with different state of each master (needs CRDT and clocks)
- Collaboration: coarse conflict resolution, but can be improved with CRDT and file metadata
- TODO: Is tracking changes reliable? can watch events get lost, e.g. when peer dies?
Approach with Personal Cloud nodes
Upload strategy and topology
To optimize replication for Multi-hyperdrive we can distinguish between the capabilities of the peers, taking the following into account:
- Cost. Peer could be on a metered network, this is typical for a cellphone network.
- Speed. Peer could be on a slow network, unlimited but slow like DSL
- Latency. Peer could be on a network with high latency, like satellite
- Availability. Peer is not always on, like the mobile and web app
Discovery of topology
Multi-hyperdrive topology is any-to-any. We need something different here.
Originator of the change can discover the capabilities of the Peers in the swarm (a separate project that would utilize small DHT storage, already used by Bitfinex), and adjust replication strategy in the following ways:
- Change is always uploaded in chunks to multiple peers, but for mobile this is not good. It it better to upload to Cloud peers, as opposed to the iPad for example, as iPad may, and will, go to sleep any moment, and won't support stable content propagation.
- Even more importantly, each chunk should be uploaded just once (not what happens today), to preserve bandwidth and costs while on the cell phone networks. In other words, Cloud peers should prefer to replicate from each other, not from mobiles. See the discussion on that.
- Exception is AirDrop-like network scenarios, when devices replicate directly on local network, and that want to avoid cloud altogether
Merge strategy
Hyperbee and Hypercore need different merge strategy from Hyperdrive. Multi-hyperdrive does not materialize merges. But for Hyperbee especially, this could be unavoidable. Data can be fetched on watch and immediately copied to local feed, thus allowing searches. Hypertrie may continue to use the same strategy as Hyperdrive.
Now, how do Cloud peers achieve consensus?
Consensus research
-
Simple and fast Consensus on Cloud Peers. Because Cloud peers are always available and are on a fast network, consensus algorithm can be simpler and greatly reduce the probability of inconsistency. Time in Cloud can be managed well, further simplifying consensus algorithm. We can start with Cloud Peers in the same data center, but on different machines, and even different racks, and maybe different zones in the same data center for power isolation. Then we can develop an equivalent to AWS availability zones with a more complex consensus algorithm.
-
Which consensus algorithm? Consensus research for Databases has been supercharged with the advent of Blockchains. EOS blockchain demonstrated that if we assume all peers are on a fast reliable network with low latency, a much simpler consensus algorithm becomes possible and it converges 2-3 orders of magnitude faster (EOS can do 3000 transactions per second).
1.1. Non-Byzantine algorithms used in databases are Paxos and RAFT.
1.1 PBFT is quite mature, supports Byzantine faults, but requires (n-1)/3 nodes (so minimum 7 nodes?) and has difficult leader selection.
1.1 Tendermint improves by rotating the leader every round, and skips non-responding leader automatically (how many peers minimum?)
Leaderless non-Byzantine consensus
We set out to support multi-device and team-collaboration scenarios. Most changes are expected from personal devices. Later, Cloud App will also generate new data as well. We will limit by the data type what changes Cloud peers can initiate so that they do not clash with a single-writer model.
If we design do non-Byzantine faults we can be make use of a new approach for leaderless multi-master, used by AWS DymamoDB and Azure Cosmos. It is based CRDT innovation that occurred in the last 5 years.
CRDT merge
Merge changes into master with the help of CRDT, used in Redit, as well as Cloud-scale databases AWS DynamoDB and Azure Cosmos. Use yjs or https://github.com/automerge/automerge).
Secure Clock
Use vector / bloom / HLC clocks to resolve conflicts, to achieve 100% the same state on all nodes, eventually :-).