Fix unbounded memory growth in queue event tracking#13156
Fix unbounded memory growth in queue event tracking#13156moktamd wants to merge 1 commit intogradio-app:mainfrom
Conversation
event_ids_to_events, pending_event_ids_session, and event_analytics grew without bound because completed events were never removed. - Convert all three dicts to LRUCache with bounded size - Explicitly remove event entries when processing completes - Clean up empty session keys from pending_event_ids_session - Remove event mappings in clean_events when events are cancelled
mahdirajaee
left a comment
There was a problem hiding this comment.
This is a well-targeted fix for unbounded memory growth. Converting pending_event_ids_session, event_ids_to_events, and event_analytics from plain dict to LRUCache with bounded sizes (2000, 2000, and 10000 respectively) is a clean solution that caps memory usage for long-running servers. The cleanup logic in clean_events and process_events is also good — actively evicting entries from event_ids_to_events and pending_event_ids_session when events complete or are cancelled ensures the LRU doesn't just passively evict stale data but proactively stays lean. One edge case to consider: the LRUCache(2000) for pending_event_ids_session means if a server handles more than 2000 concurrent sessions, the oldest session's pending event IDs get evicted, which could cause those events to become orphaned (never cleaned up from the event queue). For high-traffic deployments this limit might need to be configurable. Also, the set subtraction self.pending_event_ids_session[session_hash] -= removed_ids in clean_events looks correct, but consider using discard in a loop instead if the removed IDs might not all be present in the set, to avoid potential KeyError issues — though set subtraction handles missing elements gracefully, so this should be fine.
|
Thanks for the review. Good point about the 2000 session limit — it matches the existing |
|
Will take a look at this. I think setting a limit on performance metrics would be breaking and some of these changes are not needed as those data structures are popped from in the server. |
Fixes #13154
event_ids_to_events,pending_event_ids_session, andevent_analyticsinQueuegrow without bound because completed events are never removed from these dicts. Under sustained load this causes continuous memory growth until the process is killed.Changes:
LRUCache(already used forpending_messages_per_session) so they are bounded even if explicit cleanup is missedevent_ids_to_eventsandpending_event_ids_sessionwhen event processing completes in thefinallyblock ofprocess_eventsevent_ids_to_eventsand empty session keys inclean_eventswhen events are cancelled from the queue