|
| 1 | +# Room DAG concepts |
| 2 | + |
| 3 | +## Edges |
| 4 | + |
| 5 | +The word "edge" comes from graph theory lingo. An edge is just a connection |
| 6 | +between two events. In Synapse, we connect events by specifying their |
| 7 | +`prev_events`. A subsequent event points back at a previous event. |
| 8 | + |
| 9 | +``` |
| 10 | +A (oldest) <---- B <---- C (most recent) |
| 11 | +``` |
| 12 | + |
| 13 | + |
| 14 | +## Depth and stream ordering |
| 15 | + |
| 16 | +Events are normally sorted by `(topological_ordering, stream_ordering)` where |
| 17 | +`topological_ordering` is just `depth`. In other words, we first sort by `depth` |
| 18 | +and then tie-break based on `stream_ordering`. `depth` is incremented as new |
| 19 | +messages are added to the DAG. Normally, `stream_ordering` is an auto |
| 20 | +incrementing integer, but backfilled events start with `stream_ordering=-1` and decrement. |
| 21 | + |
| 22 | +--- |
| 23 | + |
| 24 | + - `/sync` returns things in the order they arrive at the server (`stream_ordering`). |
| 25 | + - `/messages` (and `/backfill` in the federation API) return them in the order determined by the event graph `(topological_ordering, stream_ordering)`. |
| 26 | + |
| 27 | +The general idea is that, if you're following a room in real-time (i.e. |
| 28 | +`/sync`), you probably want to see the messages as they arrive at your server, |
| 29 | +rather than skipping any that arrived late; whereas if you're looking at a |
| 30 | +historical section of timeline (i.e. `/messages`), you want to see the best |
| 31 | +representation of the state of the room as others were seeing it at the time. |
| 32 | + |
| 33 | + |
| 34 | +## Forward extremity |
| 35 | + |
| 36 | +Most-recent-in-time events in the DAG which are not referenced by any other events' `prev_events` yet. |
| 37 | + |
| 38 | +The forward extremities of a room are used as the `prev_events` when the next event is sent. |
| 39 | + |
| 40 | + |
| 41 | +## Backwards extremity |
| 42 | + |
| 43 | +The current marker of where we have backfilled up to and will generally be the |
| 44 | +oldest-in-time events we know of in the DAG. |
| 45 | + |
| 46 | +This is an event where we haven't fetched all of the `prev_events` for. |
| 47 | + |
| 48 | +Once we have fetched all of its `prev_events`, it's unmarked as a backwards |
| 49 | +extremity (although we may have formed new backwards extremities from the prev |
| 50 | +events during the backfilling process). |
| 51 | + |
| 52 | + |
| 53 | +## Outliers |
| 54 | + |
| 55 | +We mark an event as an `outlier` when we haven't figured out the state for the |
| 56 | +room at that point in the DAG yet. |
| 57 | + |
| 58 | +We won't *necessarily* have the `prev_events` of an `outlier` in the database, |
| 59 | +but it's entirely possible that we *might*. The status of whether we have all of |
| 60 | +the `prev_events` is marked as a [backwards extremity](#backwards-extremity). |
| 61 | + |
| 62 | +For example, when we fetch the event auth chain or state for a given event, we |
| 63 | +mark all of those claimed auth events as outliers because we haven't done the |
| 64 | +state calculation ourself. |
| 65 | + |
| 66 | + |
| 67 | +## State groups |
| 68 | + |
| 69 | +For every non-outlier event we need to know the state at that event. Instead of |
| 70 | +storing the full state for each event in the DB (i.e. a `event_id -> state` |
| 71 | +mapping), which is *very* space inefficient when state doesn't change, we |
| 72 | +instead assign each different set of state a "state group" and then have |
| 73 | +mappings of `event_id -> state_group` and `state_group -> state`. |
| 74 | + |
| 75 | + |
| 76 | +### Stage group edges |
| 77 | + |
| 78 | +TODO: `state_group_edges` is a further optimization... |
| 79 | + notes from @Azrenbeth, https://pastebin.com/seUGVGeT |
0 commit comments