Skip to content

Verification

Tim Jungnickel edited this page May 8, 2017 · 11 revisions

Challenge

The main challenge of the verification is to show that the application state converges. By converge we mean, that whenever pluto workers are modifying their local state, the state of the workers will be identical after all messages are exchanged.

We make several assumptions to the system model and the underlying network layer, which we discuss on this page.

System Model

Modeling a system is tough. In general, a model provides a certain abstraction from real-world details and implementations. In the same time, we do not want to loose the connection to the working system by over-abstracting all interesting features. On the other hand, we need to be able to express our implementation and the environment precisely, but without rewriting the source code. The major challenge is to find the right level of abstraction for this purpose.

For pluto, we identified the implemented CRDT layer as most interesting to reason about. A suitable system model therefor should cover how the IMAP state is stored, and how all IMAP commands are processed. Because pluto promises convergence of diverged replica, the surrounding components and the network architecture must be modeled as well. This leads to a classical model of a distributed system. Luckily, we are able to build on top of decades of distributed systems research.

For now, we build our model with the essentials provided by Nancy Lynch. Hint: her book Distributed Algorithms is one of the most fundamental books for distributed systems. We begin by defining plutos environment before we discuss the implementation of the IMAP state.

Our system consists of:

  • A set of processes P, which can be seen as multiple instances of pluto.
  • A network of communication channels between all processes.
  • One process can send and receive messages, we distinguish between the following types of messages:
    • IMAP messages, such as CREATE, DELETE, APPEND, STORE, etc. with their parameters.
    • Internal messages, such as downstream operations from the CRDT layer.
  • We have an asynchronous system, i.e. there is no relative speed between the processes.
  • For now, we assume causal delivery of the messages.

Having this relatively traditional model of a distributed system, we can start to implement the processes. For now, we do not want to go into details. The basic intuition of the implementations is, that a process stores the application state in a CRDT. All incoming IMAP messages are translated to an operation on the CRDT. According to the specification of the CRDT, downstream operations are sent to all replica in order to achieve convergence.

TODO: add a description of the failures we are tolerating.

Convergence

In order to actually verify something, we need to express which properties are desirable for our system. In our case, the leading property is convergence. Now, what do we mean by that?

Eventual consistency essentially contains the notion of convergence. In the simplest definition, a system provides eventual consistency when an identical state of all replica is reached when all messages are exchanged. We note that there are several definitions and multiple levels of granularity.

In our system, we need to express this property and prove that this property holds as long as the assumptions are fulfilled. A formal description of this property could look like this:

Unordered List of Resources

Clone this wiki locally