-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Split out mutable event content from event cache into new caches that are keyed by room IDย #13916
Description
In the hopes of fixing #11521 and paving the way towards an immutable external event cache (#2123), a new architecture for the event cache is proposed.
The current state
Currently there are a couple separate data structures related to caching event contents in memory in Synapse (see code for fetching an event from cache/db here):
EventsWorkerStore._get_event_cache- An instance of AsyncLruCache implemented as a map ofEventID-> (EventBase, redactedEventBase| None).- This cache is populated after fetching an event from the database.
- Entries in this cache are invalidated when an event is deleted from the database (in most cases, c.f. completely broken room after purge_room/rejoinย #11521), redacted or marked as rejected.
- Entries in this cache are invalidated when the size limit of the LruCache are reached.
EventsWorkerStore._event_ref- A WeakValueDictionary which serves as a single point of reference for EventBase's in memory, ensuring that we don't end up with multiple, unnecessary copies of a single EventBase in memory.- This data structure is populated after fetched an event from the database.
- Because this is a WeakValueDictionary, entries in this cache are invalidated when all other references to the EventBase in an entry are gone.
- Entries in this cache are invalidated when an event is deleted from the database (in most cases, c.f. completely broken room after purge_room/rejoinย #11521), redacted or marked as rejected.
- Entries in this cache are not invalidated when an entry is evicted from
EventsWorkerStore._get_event_cache, as something else may still be processing the event, even if it's been removed from that cache.
What's the problem?
See #11521; because each of these caches are keyed by EventID alone, it becomes tricky to invalidate them when all you have is a RoomID (i.e. when purging a room completely). We could query all known events for a room from the database, but that may result in millions of events. Ideally we'd have some map of RoomID -> EventID which only covers the events that are actually currently held in memory. We could then use that to invalidate all three of these caches.
Additionally, as get_event_cache contains mutable EventCacheEntrys (comprised of EventBase, redacted EventBase | None), invalidating them is necessary when an event is both redacted or marked as rejected. These can differ per-homeserver, so removing this component from the cache entries opens up avenues for multiple homeservers sharing the same, immutable event cache.
Proposal
After speaking with @erikjohnston we've (mostly Erik :) came up with the following idea:
EventsWorkerStore._get_event_cachewould simply become a map ofEventID->EventBase.- We add a separate cache which is a nested map of
RoomID->EventID->{rejected_status: bool, redacted_event_content: Optional[EventBase]}.- Entries are added to this map when an event is pulled from the database. We know the RoomID at this point.
- Entries are not invalidated from this map when an entry is
EventsWorkerStore._get_event_cacheis invalidated due to hitting the cache size. - This does mean that we'll need to know the RoomID when querying for rejected/redacted status though... But we can get that from the event cache?
The beauty of this is that we no longer need to invalidate the _get_event_cache at all (unless the size limit is hit)! Even in the room purge use case! How? Here are some examples of using this system:
Fetch EventID A which is not in-memory
- Some calling function asks for EventID A.
- This does not exist in
_get_event_cache(nor other caches) so we query from the database. The event and related metadata is fetched from the DB (event_json,redactions,rejections) and both the_get_event_cacheand event metadata cache are populated. - Return information from the database.
Fetch EventID A which is in-memory
- Some calling function asks for EventID A.
- This already exists in
_get_event_cache, and presumably the metadata cache. We take the RoomID from the EventBase in the_get_event_cacheand query the event metadata cache. - Return information from both caches.
EventID A is not in-memory but the event has been purged
- Some calling function asks for EventID A.
- This already exists in
_get_event_cache, and presumably the metadata cache. We take the RoomID from the EventBase in the cache and query the event metadata cache - but uh oh, there's no matching entry in the metadata cache! The event must have been purged. - We invalidate the entry in the event cache as well and return None.
Thus when purging a room, we only need to purge entries in the metadata cache (which we can easily do by RoomID due to the metadata cache's structure). Entries in the get_event_cache and event_ref will be invalidated as they are fetched.
I'm curious for thoughts on whether this sounds reasonable from other members of the Synapse team + cc @Fizzadar.