|
| 1 | +# Compression algorithm |
| 2 | + |
| 3 | +## What is state? |
| 4 | +State is things like who is in a room, what the room topic/name is, who has |
| 5 | +what privilege levels etc. Synapse keeps track of it for various reasons such as |
| 6 | +spotting invalid events (e.g. ones sent by banned users) and providing room membership |
| 7 | +information to clients. |
| 8 | + |
| 9 | +## What is a state group? |
| 10 | + |
| 11 | +Synapse needs to keep track of the state at the moment of each event. A state group |
| 12 | +corresponds to a unique state. The database table `event_to_state_groups` keeps track |
| 13 | +of the mapping from event ids to state group ids. |
| 14 | + |
| 15 | +Consider the following simplified example: |
| 16 | +``` |
| 17 | +State group id | State |
| 18 | +_____________________________________________ |
| 19 | + 1 | Alice in room |
| 20 | + 2 | Alice in room, Bob in room |
| 21 | + 3 | Bob in room |
| 22 | +
|
| 23 | +
|
| 24 | +Event id | What the event was |
| 25 | +______________________________________ |
| 26 | + 1 | Alice sends a message |
| 27 | + 3 | Bob joins the room |
| 28 | + 4 | Bob sends a message |
| 29 | + 5 | Alice leaves the room |
| 30 | + 6 | Bob sends a message |
| 31 | +
|
| 32 | +
|
| 33 | +Event id | State group id |
| 34 | +_________________________ |
| 35 | + 1 | 1 |
| 36 | + 2 | 1 |
| 37 | + 3 | 2 |
| 38 | + 4 | 2 |
| 39 | + 5 | 3 |
| 40 | + 6 | 3 |
| 41 | +``` |
| 42 | + |
| 43 | +## What are deltas and predecessors? |
| 44 | +When a new state event happens (e.g. Bob joins the room) a new state group is created. |
| 45 | +BUT instead of copying all of the state from the previous state group, we just store |
| 46 | +the change from the previous group (saving on lots of storage space!). The difference |
| 47 | +from the previous state group is called the "delta". |
| 48 | + |
| 49 | +So for the previous example, we would have the following (Note only rows 1 and 2 will |
| 50 | +make sense at this point): |
| 51 | + |
| 52 | +``` |
| 53 | +State group id | Previous state group id | Delta |
| 54 | +____________________________________________________________ |
| 55 | + 1 | NONE | Alice in room |
| 56 | + 2 | 1 | Bob in room |
| 57 | + 3 | NONE | Bob in room |
| 58 | +``` |
| 59 | + |
| 60 | +So why is state group 3's previous state group NONE and not 2? Well, the way that deltas |
| 61 | +work in Synapse is that they can only add in new state or overwrite old state, but they |
| 62 | +cannot remove it. (So if the room topic is changed then that is just overwriting state, |
| 63 | +but removing Alice from the room is neither an addition nor an overwriting). If it is |
| 64 | +impossible to find a delta, then you just start from scratch again with a "snapshot" of |
| 65 | +the entire state. |
| 66 | + |
| 67 | +(NOTE this is not documentation on how synapse handles leaving rooms but is purely for illustrative |
| 68 | +purposes) |
| 69 | + |
| 70 | +The state of a state group is worked out by following the previous state group's and adding |
| 71 | +together all of the deltas (with the most recent taking precedence). |
| 72 | + |
| 73 | +The mapping from state group to previous state group takes place in `state_group_edges` |
| 74 | +and the deltas are stored in `state_groups_state`. |
| 75 | + |
| 76 | +## What are we compressing then? |
| 77 | +In order to speed up the conversion from state group id to state, there is a limit of 100 |
| 78 | +hops set by synapse (that is: we will only ever have to look up the deltas for a maximum of |
| 79 | +100 state groups). It does this by taking another "snapshot" every 100 state groups. |
| 80 | + |
| 81 | +However, it is these snapshots that take up the bulk of the storage in a synapse database, |
| 82 | +so we want to find a way to reduce the number of them without dramatically increasing the |
| 83 | +maximum number of hops needed to do lookups. |
| 84 | + |
| 85 | + |
| 86 | +## Compression Algorithm |
| 87 | + |
| 88 | +The algorithm works by attempting to create a *tree* of deltas, produced by |
| 89 | +appending state groups to different "levels". Each level has a maximum size, where |
| 90 | +each state group is appended to the lowest level that is not full. This tool calls a |
| 91 | +state group "compressed" once it has been added to |
| 92 | +one of these levels. |
| 93 | + |
| 94 | +This produces a graph that looks approximately like the following, in the case |
| 95 | +of having two levels with the bottom level (L1) having a maximum size of 3: |
| 96 | + |
| 97 | +``` |
| 98 | +L2 <-------------------- L2 <---------- ... |
| 99 | +^--- L1 <--- L1 <--- L1 ^--- L1 <--- L1 <--- L1 |
| 100 | +
|
| 101 | +NOTE: A <--- B means that state group B's predecessor is A |
| 102 | +``` |
| 103 | +The structure that synapse creates by default would be equivalent to having one level with |
| 104 | +a maximum length of 100. |
| 105 | + |
| 106 | +**Note**: Increasing the sum of the sizes of levels will increase the time it |
| 107 | +takes to query the full state of a given state group. |
0 commit comments