Skip to content

Conversation

DonalEvans
Copy link
Contributor

Prior to this change, if only one of the .inference or .secrets-inference indices was updated when creating an inference endpoint, the endpoint creation would fail, but the successfully written doc was not removed, leading to inconsistent document counts between the two indices.

This commit removes any documents that were written in the case that a partial failure occurred, but does not change the behaviour in the case where no updates to the indices were made.

  • Invoke a cleanup listener if a partial failure occurred when storing inference endpoint information in the .inference and .secrets-inference indices
  • Refactor ModelRegistry to use BulkRequestBuilder.add(IndexRequestBuilder) instead of the deprecated BulkRequestBuilder.add(IndexRequest)
  • Include cause when logging bulk failure during inference endpoint creation
  • Add integration tests for the new behaviour

Closes #123726

Prior to this change, if only one of the .inference or
.secrets-inference indices was updated when creating an inference
endpoint, the endpoint creation would fail, but the successfully written
doc was not removed, leading to inconsistent document counts between the
two indices.

This commit removes any documents that were written in the case that a
partial failure occurred, but does not change the behaviour in the case
where no updates to the indices were made.

- Invoke a cleanup listener if a partial failure occurred when storing
  inference endpoint information in the .inference and
  .secrets-inference indices
- Refactor ModelRegistry to use
  BulkRequestBuilder.add(IndexRequestBuilder) instead of the deprecated
  BulkRequestBuilder.add(IndexRequest)
- Include cause when logging bulk failure during inference endpoint
  creation
- Add integration tests for the new behaviour

Closes elastic#123726
@DonalEvans DonalEvans added >bug :ml Machine learning Team:ML Meta label for the ML team v9.3.0 labels Oct 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Hi @DonalEvans, I've created a changelog YAML for you.

}

logBulkFailures(model.getConfigurations().getInferenceEntityId(), bulkItemResponses);
boolean anySuccess = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It might be slightly cleaner to move this boolean above the cleanupListener. Then in cleanupListener we can have a check for anySuccess, if there was a success, then do the delete.

Then we don't need to track listenerToInvoke.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done

@DonalEvans DonalEvans merged commit 9abc0bd into elastic:main Oct 15, 2025
34 checks passed
@DonalEvans DonalEvans added auto-backport Automatically create backport pull requests when merged v8.19.6 v9.1.6 v9.2.1 labels Oct 16, 2025
@DonalEvans DonalEvans deleted the clean-up-inference-indices-on-failure branch October 16, 2025 00:15
DonalEvans added a commit to DonalEvans/elasticsearch that referenced this pull request Oct 16, 2025
…136577)

Prior to this change, if only one of the .inference or
.secrets-inference indices was updated when creating an inference
endpoint, the endpoint creation would fail, but the successfully written
doc was not removed, leading to inconsistent document counts between the
two indices.

This commit removes any documents that were written in the case that a
partial failure occurred, but does not change the behaviour in the case
where no updates to the indices were made.

- Invoke a cleanup listener if a partial failure occurred when storing
  inference endpoint information in the .inference and
  .secrets-inference indices
- Refactor ModelRegistry to use
  BulkRequestBuilder.add(IndexRequestBuilder) instead of the deprecated
  BulkRequestBuilder.add(IndexRequest)
- Include cause when logging bulk failure during inference endpoint
  creation
- Add integration tests for the new behaviour
- Update docs/changelog/136577.yaml

Closes elastic#123726

(cherry picked from commit 9abc0bd)
@DonalEvans
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
9.2
9.1
8.19

Questions ?

Please refer to the Backport tool documentation

DonalEvans added a commit to DonalEvans/elasticsearch that referenced this pull request Oct 16, 2025
…136577)

Prior to this change, if only one of the .inference or
.secrets-inference indices was updated when creating an inference
endpoint, the endpoint creation would fail, but the successfully written
doc was not removed, leading to inconsistent document counts between the
two indices.

This commit removes any documents that were written in the case that a
partial failure occurred, but does not change the behaviour in the case
where no updates to the indices were made.

- Invoke a cleanup listener if a partial failure occurred when storing
  inference endpoint information in the .inference and
  .secrets-inference indices
- Refactor ModelRegistry to use
  BulkRequestBuilder.add(IndexRequestBuilder) instead of the deprecated
  BulkRequestBuilder.add(IndexRequest)
- Include cause when logging bulk failure during inference endpoint
  creation
- Add integration tests for the new behaviour
- Update docs/changelog/136577.yaml

Closes elastic#123726

(cherry picked from commit 9abc0bd)
Kubik42 pushed a commit to Kubik42/elasticsearch that referenced this pull request Oct 16, 2025
…136577)

Prior to this change, if only one of the .inference or
.secrets-inference indices was updated when creating an inference
endpoint, the endpoint creation would fail, but the successfully written
doc was not removed, leading to inconsistent document counts between the
two indices.

This commit removes any documents that were written in the case that a
partial failure occurred, but does not change the behaviour in the case
where no updates to the indices were made.

- Invoke a cleanup listener if a partial failure occurred when storing
  inference endpoint information in the .inference and
  .secrets-inference indices
- Refactor ModelRegistry to use
  BulkRequestBuilder.add(IndexRequestBuilder) instead of the deprecated
  BulkRequestBuilder.add(IndexRequest)
- Include cause when logging bulk failure during inference endpoint
  creation
- Add integration tests for the new behaviour
- Update docs/changelog/136577.yaml

Closes elastic#123726
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :ml Machine learning Team:ML Meta label for the ML team v8.19.6 v9.1.6 v9.2.1 v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ML] Inconsistent document count in .inference and .secrets-inference

3 participants