-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Optimize memory usage in ShardBulkInferenceActionFilter #124313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize memory usage in ShardBulkInferenceActionFilter #124313
Conversation
This refactor improves memory efficiency by processing inference requests in batches, capped by a max input length. Changes include: - A new dynamic operator setting to control the maximum batch size in bytes. - Dropping input data from inference responses when the legacy semantic text format isn’t used, saving memory. - Clearing inference results dynamically after each bulk item to free up memory sooner. This is a step toward enabling circuit breakers to better handle memory usage when dealing with large inputs.
|
Pinging @elastic/search-inference-team (Team:Search - Inference) |
|
Pinging @elastic/search-eng (Team:SearchOrg) |
|
Pinging @elastic/search-relevance (Team:Search - Relevance) |
|
Hi @jimczi, I've created a changelog YAML for you. |
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall changes LGTM
...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Few small code comments, and I think there's still an inefficiency (but that's also prior to this PR).
...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java
Outdated
Show resolved
Hide resolved
| // ignore delete request | ||
| continue; | ||
| if (useLegacyFormat) { | ||
| var newDocMap = indexRequest.sourceAsMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll close my PR; it conflicts badly with this.
I'll check whether this resolves the inefficiency I spotted and let you know.
...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, I left a few non-blocking comments
...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java
Outdated
Show resolved
Hide resolved
7cb386f to
06a96e9
Compare
|
Hi @jimczi, I've created a changelog YAML for you. |
💚 Backport successful
|
…24863) This refactor improves memory efficiency by processing inference requests in batches, capped by a max input length. Changes include: - A new dynamic operator setting to control the maximum batch size in bytes. - Dropping input data from inference responses when the legacy semantic text format isn’t used, saving memory. - Clearing inference results dynamically after each bulk item to free up memory sooner. This is a step toward enabling circuit breakers to better handle memory usage when dealing with large inputs.
This refactor improves memory efficiency by processing inference requests in batches, capped by a max input length.
Changes include:
This is a step toward enabling circuit breakers to better handle memory usage when dealing with large inputs.