Skip to content

Conversation

@jimczi
Copy link
Contributor

@jimczi jimczi commented Mar 7, 2025

This refactor improves memory efficiency by processing inference requests in batches, capped by a max input length.

Changes include:

  • A new dynamic operator setting to control the maximum batch size in bytes.
  • Dropping input data from inference responses when the legacy semantic text format isn’t used, saving memory.
  • Clearing inference results dynamically after each bulk item to free up memory sooner.

This is a step toward enabling circuit breakers to better handle memory usage when dealing with large inputs.

This refactor improves memory efficiency by processing inference requests in batches, capped by a max input length.

Changes include:
- A new dynamic operator setting to control the maximum batch size in bytes.
- Dropping input data from inference responses when the legacy semantic text format isn’t used, saving memory.
- Clearing inference results dynamically after each bulk item to free up memory sooner.

This is a step toward enabling circuit breakers to better handle memory usage when dealing with large inputs.
@jimczi jimczi added >enhancement :SearchOrg/Relevance Label for the Search (solution/org) Relevance team :SearchOrg/Inference Label for the Search Inference team v8.19.0 v9.1.0 labels Mar 7, 2025
@jimczi jimczi requested review from Mikep86 and jan-elastic March 7, 2025 12:34
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-inference-team (Team:Search - Inference)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @jimczi, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall changes LGTM

Copy link
Contributor

@jan-elastic jan-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Few small code comments, and I think there's still an inefficiency (but that's also prior to this PR).

// ignore delete request
continue;
if (useLegacyFormat) {
var newDocMap = indexRequest.sourceAsMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll close my PR; it conflicts badly with this.

I'll check whether this resolves the inefficiency I spotted and let you know.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, I left a few non-blocking comments

@jimczi jimczi force-pushed the shard_bulk_inference_filter_memory branch from 7cb386f to 06a96e9 Compare March 7, 2025 16:55
@jimczi jimczi added the auto-backport Automatically create backport pull requests when merged label Mar 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @jimczi, I've created a changelog YAML for you.

@jimczi jimczi merged commit 361b51d into elastic:main Mar 14, 2025
17 checks passed
@jimczi jimczi deleted the shard_bulk_inference_filter_memory branch March 14, 2025 09:51
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

elasticsearchmachine pushed a commit that referenced this pull request Mar 14, 2025
…24863)

This refactor improves memory efficiency by processing inference requests in batches, capped by a max input length.

Changes include:
- A new dynamic operator setting to control the maximum batch size in bytes.
- Dropping input data from inference responses when the legacy semantic text format isn’t used, saving memory.
- Clearing inference results dynamically after each bulk item to free up memory sooner.

This is a step toward enabling circuit breakers to better handle memory usage when dealing with large inputs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >enhancement :Search Relevance/Search Catch all for Search Relevance :SearchOrg/Inference Label for the Search Inference team :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants