-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Periodically check the available memory when fetching search hits source #121920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Periodically check the available memory when fetching search hits source #121920
Conversation
This adds bytes accounting for the source loading of search hits in the request circuit breaker.
...enrich/src/main/java/org/elasticsearch/xpack/enrich/action/EnrichShardMultiSearchAction.java
Outdated
Show resolved
Hide resolved
|
Restarting CI due to #121927 |
|
@elasticmachine run elasticsearch-ci/part-2 |
|
Current failure is #122153 |
|
@elasticmachine update branch |
|
Thanks for the iterations here @original-brownbear Added unit tests and updated the PR description. I think this is now ready if you'd like to have another look. |
| if (context.isCancelled()) { | ||
| throw new TaskCancelledException("cancelled"); | ||
| } | ||
| ++processedDocs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could potentially not check the breaker when finishing collecting from a segment and rely on other parallel fetch operations running in the system to check the real memory breaker. Not checking at the end of a segment leaves us exposed to scenarios where we fetch the source from 600 segments and we collect 900KiB for every segment (in which case no segment will check the breaker).
If we decide to carry on as it is now and check at the end of a segment this has the potentially unwanted (or perhaps it is wanted?) side effect of checking once / segment even if the source is never requested.
edit: Thinking about it some more we can have a better compromise where we check at the end of every segment only if we requested the source. I think this minimizes the impact of this feature to only what we're looking for (targeting the loading of the source)
e.g.
(requiresSource && processedDocs == docsInLeaf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this change in abacdb1 but let me know what you think @original-brownbear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm as a I pointed out above, any check at a higher rate than the TLAB region size is probably not a good idea.
But now that you're asking :) Why do we even bother wit the per-leaf cost and reset the counts on a per-leaf basis. If we just don't reset on a new segment this problem goes away and the behavior is generally less affected by index geometry as well which is always nice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, moved to per fetch phase accounting as inter segment concurrency is not available in the fetch phase.
Added minimum size of 1MiB for the buffer and no upper limit (configuring a large value could enable us to disable the feature completely)
|
|
||
| private final RefCounted refCounted; | ||
|
|
||
| // used only in tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong :)
| } | ||
|
|
||
| public static SearchHit unpooled(int nestedTopDocId, String id, NestedIdentity nestedIdentity) { | ||
| // always referenced search hits do NOT call #deallocate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might seem as noise but it helped me when making sense of the pool/unpool stuff so thought maybe it's useful to future spelunkers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks just fine thanks Andrei! Maybe have a short chat on the setting bounds and the per-segment logic but other than that I think we're good to go here :)
| * This buffer is used to locally track the memory accummulate during the executiong of | ||
| * a search request before submitting the accumulated value to the circuit breaker. | ||
| */ | ||
| public static final Setting<ByteSizeValue> MEMORY_ACCOUNTING_BUFFER_SIZE = Setting.byteSizeSetting( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should give this a lower bound of 1M actually and maybe and upper bound of 32M and default it to the G1GC region size. If you think about it, checking at a finer granularity than the TLAB size really makes no sense whatsoever. Likewise, going way above the granularity makes like sense as well.
We can't really default the setting to the TLAB size cleanly I think if we make it a cluster setting so I think 1M might be an ok default since that's often our region size?
But since OpenJDK doesn't allow a lower value for the region size we probably shouldn't either.
| if (context.isCancelled()) { | ||
| throw new TaskCancelledException("cancelled"); | ||
| } | ||
| ++processedDocs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm as a I pointed out above, any check at a higher rate than the TLAB region size is probably not a good idea.
But now that you're asking :) Why do we even bother wit the per-leaf cost and reset the counts on a per-leaf basis. If we just don't reset on a new segment this problem goes away and the behavior is generally less affected by index geometry as well which is always nice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of nits, they can also be resolved as a follow-up, not urgent.
server/src/main/java/org/elasticsearch/search/SearchService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java
Outdated
Show resolved
Hide resolved
| /** | ||
| * Return the amount of memory to buffer locally before accounting for it in the breaker. | ||
| */ | ||
| public abstract long memAccountingBufferSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary to add these two new methods to SearchContext? I am not quite following where they are used, are they needed mostly for testing purposes or is there more to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what I am wondering is if we could make them perhaps arguments of the following method instead, or something along those lines, just to decrease the blast radius of this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a version where the parameters were exposed as part of SearchService#executeFetchPhase but we then need to add them in the AggregationContext (for top hits) and somehow trickle them to InnerHitsPhase.
This seemed like the cleanest way to do it. Is there a better way?
When fetching documents, sometimes we need to load the
entire source of search hits. Document sources can be large,
and with support for up to 10k hits per search request, this creates
a significant untracked memory load on Elasticsearch that can
potentially cause out-of-memory errors.
This PR adds memory checking for hits source in the fetch phase.
We check with the parent (the real memory) circuit breaker every
1MiB of loaded source and when fetching the last document of every
segment. This gives the real memory breaker a chance to interrupt
running operations when we're running low on memory, and prevent
potential OOMs.
The amount of local accounting to buffer is controlled by the
search.memory_accounting_buffer_sizedynamic setting and defaults to1MiB.Fixes #89656