-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
When very large response data structures (eg vectors) are returned, the memory is not immediately freed back to the OS. This is useful because this memory can soon be required again, but this is not accounted for in the RSS metric vs db size.
To make it clearer what portion of memory is used up by response elements and not yet returned to the OS, we need to consider a time element, although the vector may be destroyed, the pages are still in use.
A metric should be introduced which adds up memory used by response data structures (or other similar places) and periodically releases the memory on a cadence. One possible approach could be a count min sketch with a sliding window based expiry, eg https://arxiv.org/pdf/2406.07953v1
When a result is returned, its size is approximated to a fixed bucket size, and the frequency is incremented in the sketch.
When the metric is evaluated, it iterates over fixed bucket sizes, finds their frequency and adds them up.
Older allocations are discarded using sliding window.