-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add developer documentation to explain room DAG concepts like outliers and state_groups
#10464
Changes from 4 commits
a2d3aff
785abe4
63fd9cc
3127f20
2d68492
12c0b89
7392b63
abe66d1
885a880
e79502c
85e66d6
997a4be
64ebf42
9e1e92c
e9f58df
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Add developer FAQ to explain `outliers` and `state_groups`. |
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,17 @@ | ||||||||||
| # Developer FAQ | ||||||||||
|
|
||||||||||
| ## What is an `outlier`? | ||||||||||
|
|
||||||||||
| An `outlier` is an arbitrary floating event in the DAG (as opposed to being | ||||||||||
| inline with the current DAG). It also means that we don't have the state events | ||||||||||
| backfilled on the homeserver and we trust the events *claimed* auth events rather | ||||||||||
| than those we calculate and verify to be correct. | ||||||||||
|
|
||||||||||
| An event can be unmarked as an `outlier` once we fetch all of its `prev_events` (you will see some `ex_outlier` code around this). | ||||||||||
|
||||||||||
| Normally, our calculated auth_events based on the state of the room | |
| at the event's position in the DAG, though occasionally (eg if the | |
| event is an outlier), may be the auth events claimed by the remote | |
| server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all info @richvdh 😀
These seem so closely coupled and aren't mutually exclusive in real life situations so it's hard for me to make a clear description 🤯
I've updated with something more inline with your description but feel free to rewrite.
I've mainly been working with floating outliers so I don't think I had the complete picture.
I feel like I don't understand where an outlier becomes an ex_outlier.
Say we fetch a missing auth event for an event, we mark that auth event as an outlier. Then at what point later, do we get the event again as a non-outlier? From your statement, it's not when we have fetched all of the prev_events as those could already be all persisted in the db.
There is _update_outliers_txn which handles ex_outlier stuff but it's confusing where/when we will come across the same outlier event again while persisting.
Based on the DAG below, as a remote federating server, my assumption is that we process event D which fetches the state_event as an outlier. Then as we backfill more, and process state_event directly, it becomes an ex_outlier.
Mermaid live editor playground link
I'm still a bit wary on a situation where we could have all of the prev_events for the state_event but still mark it as an outlier. Perhaps with gaps in the DAG and the event hits it just right 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've mainly been working with floating outliers so I don't think I had the complete picture.
I think I might have misled you here. From Synapse's point of view, all outliers are considered to be floating, and in the majority of cases we indeed won't have their prev events. However, it's possible for the stars to align in a way that we do actually have their prev events in the database - at least as outliers themselves.
To take a simple example:
Two of the auth events for a join event in an invite-only room are the join_rules event and an invite event for the joiner. So let's suppose we fetch both of those events as outliers. Let's also suppose that the invite was issued immediately after changing the join rules - so the join_rules is the only prev-event of the invite event.
join_rules <- invite
^ ^
............................. join
In that case, it so happens that we have all the prev-events of the invite event - but we're still considering it an outlier.
So I don't want you to imagine there is a distinction between "floating outliers" and "regular outliers" - they are all just outliers in terms of how we handle them in Synapse.
Say we fetch a missing auth event for an event, we mark that auth event as an outlier. Then at what point later, do we get the event again as a non-outlier? From your statement, it's not when we have fetched all of the prev_events as those could already be all persisted in the db.
right. It becomes a non-outlier in the situation where we process it as a regular event as part of the timeline - normally via backfill.
Based on the DAG below, as a remote federating server, my assumption is that we process event D which fetches the state_event as an outlier. Then as we backfill more, and process state_event directly, it becomes an ex_outlier.
exactly so, yes.
I'm still a bit wary on a situation where we could have all of the
prev_eventsfor thestate_eventbut still mark it as anoutlier. Perhaps with gaps in the DAG and the event hits it just right 🤷
yup. Imagine there is another fork in the DAG pointing to A:
When we receive Y, we'll backfill A. So then we have state_event's prev_events - but state_event is still an outlier at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification here. I've added a paragraph explaining that there is no distinction.
I like these examples and edge cases. I'm tempted to add them into the docs but I think I'll save that for another iteration.

Uh oh!
There was an error while loading. Please reload this page.