Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Commit 2bae2c6

Browse files
Add developer documentation to explain room DAG concepts like outliers and state_groups (#10464)
1 parent a6ea32a commit 2bae2c6

File tree

3 files changed

+81
-0
lines changed

3 files changed

+81
-0
lines changed

changelog.d/10464.doc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Add some developer docs to explain room DAG concepts like `outliers`, `state_groups`, `depth`, etc.

docs/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@
7979
- [Single Sign-On]()
8080
- [SAML](development/saml.md)
8181
- [CAS](development/cas.md)
82+
- [Room DAG concepts](development/room-dag-concepts.md)
8283
- [State Resolution]()
8384
- [The Auth Chain Difference Algorithm](auth_chain_difference_algorithm.md)
8485
- [Media Repository](media_repository.md)
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Room DAG concepts
2+
3+
## Edges
4+
5+
The word "edge" comes from graph theory lingo. An edge is just a connection
6+
between two events. In Synapse, we connect events by specifying their
7+
`prev_events`. A subsequent event points back at a previous event.
8+
9+
```
10+
A (oldest) <---- B <---- C (most recent)
11+
```
12+
13+
14+
## Depth and stream ordering
15+
16+
Events are normally sorted by `(topological_ordering, stream_ordering)` where
17+
`topological_ordering` is just `depth`. In other words, we first sort by `depth`
18+
and then tie-break based on `stream_ordering`. `depth` is incremented as new
19+
messages are added to the DAG. Normally, `stream_ordering` is an auto
20+
incrementing integer, but backfilled events start with `stream_ordering=-1` and decrement.
21+
22+
---
23+
24+
- `/sync` returns things in the order they arrive at the server (`stream_ordering`).
25+
- `/messages` (and `/backfill` in the federation API) return them in the order determined by the event graph `(topological_ordering, stream_ordering)`.
26+
27+
The general idea is that, if you're following a room in real-time (i.e.
28+
`/sync`), you probably want to see the messages as they arrive at your server,
29+
rather than skipping any that arrived late; whereas if you're looking at a
30+
historical section of timeline (i.e. `/messages`), you want to see the best
31+
representation of the state of the room as others were seeing it at the time.
32+
33+
34+
## Forward extremity
35+
36+
Most-recent-in-time events in the DAG which are not referenced by any other events' `prev_events` yet.
37+
38+
The forward extremities of a room are used as the `prev_events` when the next event is sent.
39+
40+
41+
## Backwards extremity
42+
43+
The current marker of where we have backfilled up to and will generally be the
44+
oldest-in-time events we know of in the DAG.
45+
46+
This is an event where we haven't fetched all of the `prev_events` for.
47+
48+
Once we have fetched all of its `prev_events`, it's unmarked as a backwards
49+
extremity (although we may have formed new backwards extremities from the prev
50+
events during the backfilling process).
51+
52+
53+
## Outliers
54+
55+
We mark an event as an `outlier` when we haven't figured out the state for the
56+
room at that point in the DAG yet.
57+
58+
We won't *necessarily* have the `prev_events` of an `outlier` in the database,
59+
but it's entirely possible that we *might*. The status of whether we have all of
60+
the `prev_events` is marked as a [backwards extremity](#backwards-extremity).
61+
62+
For example, when we fetch the event auth chain or state for a given event, we
63+
mark all of those claimed auth events as outliers because we haven't done the
64+
state calculation ourself.
65+
66+
67+
## State groups
68+
69+
For every non-outlier event we need to know the state at that event. Instead of
70+
storing the full state for each event in the DB (i.e. a `event_id -> state`
71+
mapping), which is *very* space inefficient when state doesn't change, we
72+
instead assign each different set of state a "state group" and then have
73+
mappings of `event_id -> state_group` and `state_group -> state`.
74+
75+
76+
### Stage group edges
77+
78+
TODO: `state_group_edges` is a further optimization...
79+
notes from @Azrenbeth, https://pastebin.com/seUGVGeT

0 commit comments

Comments
 (0)