Skip to content

Conversation

@edsavage
Copy link
Contributor

@edsavage edsavage commented Oct 7, 2025

Add a rollover check for the AD results indices to the nightly ML maintenance task.

The concrete AD results indices now have a six digit suffix. This is necessary to keep track of rollover behaviour and to determine which index is the "latest" in the series.

Depends on #136458

Add a rollover check for the AD results indices to the nightly ML maintenance task.

The concrete AD reults indices now have a six digit suffix. This is necessary to keep track of rollover behaviour and to determine which index is the "latest" in the series.

WIP
@elasticsearchmachine
Copy link
Collaborator

Hi @edsavage, I've created a changelog YAML for you.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 8, 2025

@github-actions
Copy link
Contributor

github-actions bot commented Oct 8, 2025

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Comment on lines 1370 to 1372
} else if ((resultsIndexName.startsWith(AnomalyDetectorsIndexFields.RESULTS_INDEX_SHARED)
&& MlIndexAndAlias.has6DigitSuffix(resultsIndexName)
&& resultsIndexName.length() == AnomalyDetectorsIndexFields.RESULTS_INDEX_DEFAULT.length()) == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the name starts with shared and has a 6 digit suffix NNNNNN and is the same length as shared-000001 then don't enter the if block. It looks like we want to test if resultsIndexName == shared-000001 but we would also accept another character instead of the - e.g. shared+00000`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot!

`xpack.ml.nightly_maintenance_requests_per_second`
: ([Dynamic](docs-content://deploy-manage/stack-settings.md#dynamic-cluster-setting)) The rate at which the nightly maintenance task deletes expired model snapshots and results. The setting is a proxy to the [`requests_per_second`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-delete-by-query) parameter used in the delete by query requests and controls throttling. When the {{operator-feature}} is enabled, this setting can be updated only by operator users. Valid values must be greater than `0.0` or equal to `-1.0`, where `-1.0` means a default value is used. Defaults to `-1.0`

`xpack.ml.nightly_maintenance_rollover_max_size`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`xpack.ml.nightly_maintenance_rollover_max_size`
`xpack.ml.results_index_rollover_max_size`

resultsIndexName = resultsIndexName.startsWith("custom-") ? resultsIndexName : "custom-" + resultsIndexName;
}

resultsIndexName = MlIndexAndAlias.has6DigitSuffix(resultsIndexName) ? resultsIndexName : resultsIndexName + "-000001";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resultsIndexName is part of the job configuration returned in GET _ml/anomaly_detectors. If we are prescriptive about the index version (-000001) then when the index is rolled over that name is out of date.

Should resultsIndexName just be the root part without the 6 digit suffix and make sure the aliases point to the right index?

The results index is created in JobResultsProvider. Would it make more sense to do the logic to figure out the latest index name there

public void createJobResultIndex(Job job, ClusterState state, final ActionListener<Boolean> finalListener) {

This build() method is called both when creating the job and when it is parsed from the stored document. For that reason I think it would be safer to move the results index logic to where the index is created.

Copy link
Contributor

@valeriy42 valeriy42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It LGTM in general. Great work on refactoring, removing code duplication and making your code easy to read.  🚀

My main concern is about the configuration parameter nightly_maintenance_rollover_max_size.

  • Does "0B" mean effectively that all result indices will be rolled over every time the nightly maintenance task runs, aka every night?
  • Is there a way to turn off the rollover now? Do we want the user to be able to turn off the rollover?

WDYT?

`xpack.ml.nightly_maintenance_rollover_max_size`
: ([Dynamic](docs-content://deploy-manage/stack-settings.md#dynamic-cluster-setting)) The maximum size the anomaly detection results indices can reach before being rolled over by the nightly maintenance task. When the {{operator-feature}} is enabled, this setting can be updated only by operator users. Valid values must be greater than `0B` or equal to `-1B`. Defaults to `50GB`.
`xpack.ml.results_index_rollover_max_size`
: ([Dynamic](docs-content://deploy-manage/stack-settings.md#dynamic-cluster-setting)) The maximum size the anomaly detection results indices can reach before being rolled over by the nightly maintenance task. When the {{operator-feature}} is enabled, this setting can be updated only by operator users. Valid values must be greater than or equal to `0B`. A value of `0B` means the indices will always be rolled over. Defaults to `50GB`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the indices will always be rolled over

What does this mean? And how can you turn off the rollover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? And how can you turn off the rollover?

It means that regardless of size of the results indices, they will always be rolled over. I can update the docs to be more specific about this and/or we can choose a more suitable minimum value for xpack.ml.results_index_rollover_max_size

Currently no, rollover can't be turned off. I think it should be able to be though, I'll add that change.

private static final Logger logger = LogManager.getLogger(MlDailyMaintenanceService.class);
private static final org.elasticsearch.logging.Logger logger = LogManager.getLogger(MlDailyMaintenanceService.class);

private static final int MAX_TIME_OFFSET_MINUTES = 120;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a misunderstanding there. This comment is to document MAX_TIME_OFFSET_MINUTES. You don't need to document the logger.

@edsavage
Copy link
Contributor Author

  • Does "0B" mean effectively that all result indices will be rolled over every time the nightly maintenance task runs, aka every night?

Yes, if the user changed the default max rollover size value from 50GB to 0B that would cause the results indices to be rolled over regardless of size.

@edsavage
Copy link
Contributor Author

  • Is there a way to turn off the rollover now? Do we want the user to be able to turn off the rollover?

Currently no, there is no way of turning off the rollover - other than setting the max rollover size to some huge value.

We probably should give the users some flexibility in being able to turn off rollover all together. Maybe a special value of -1B for the max rollover size would be sufficient for this purpose?

* Added the ability to set results_index_rollover_max_size to -1B to configure the nightly maintenance task not to trigger indices rollover
@edsavage edsavage requested a review from valeriy42 October 30, 2025 03:32
edsavage added a commit to edsavage/elasticsearch that referenced this pull request Nov 5, 2025
Add a daily maintenance task to roll over .ml-state indices if the index size exceeds a configurable default size (default 50GB).

This replaces the previous method of using ILM to manage the state indices, as that was not a workable solution for serverless.

This builds on the work done in PR elastic#136065 which provides similar functionality for results indices.
Copy link
Contributor

@valeriy42 valeriy42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Good work!

@edsavage edsavage merged commit d554d15 into elastic:main Nov 5, 2025
34 checks passed
edsavage added a commit to edsavage/elasticsearch that referenced this pull request Nov 5, 2025
Add a daily maintenance task to roll over .ml-state indices if the index size exceeds a configurable default size (default 50GB).

This replaces the previous method of using ILM to manage the state indices, as that was not a workable solution for serverless.

This builds on the work done in PR elastic#136065 which provides similar functionality for results indices.

WIP
edsavage added a commit to edsavage/elasticsearch that referenced this pull request Nov 5, 2025
Add a daily maintenance task to roll over .ml-state indices if the index size exceeds a configurable default size (default 50GB).

This replaces the previous method of using ILM to manage the state indices, as that was not a workable solution for serverless.

This builds on the work done in PR elastic#136065 which provides similar functionality for results indices.

WIP
afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Nov 6, 2025
Add a rollover check for the AD results indices to the nightly ML maintenance task.

The results indices will be rolled over when they are of a size equal or greater to 50GB (This value can be adjusted in the cluster config)

The concrete AD reults indices now have a six digit suffix. This is necessary to keep track of rollover behaviour and to determine which index is the "latest" in the series.

We choose not to use ILM to manage the rollover as that is not available for Serverless.
szybia added a commit to szybia/elasticsearch that referenced this pull request Nov 6, 2025
…-json

* upstream/main:
  Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=false} elastic#137691
  Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=true} elastic#137690
  [LTR] Fix feature display order when using explain. (elastic#137671)
  Remove extra RemoteClusterService instances in unit test (elastic#137647)
  Fix `ComponentTemplatesFileSettingsIT.testSettingsApplied` (elastic#137669)
  Consolidates troubleshooting content into the "Returning semantic field embeddings in _source" section (elastic#137233)
  Update bundled JDK to 25.0.1 (elastic#137640)
  resolve indices for prefixed _all expressions (elastic#137330)
  ESQL: Add TopN support for exponential histograms (elastic#137313)
  allows field caps to be cross project (elastic#137530)
  ESQL: Add exponential histogram percentile function (elastic#137553)
  Wait for nodes to have downloaded databases in `GeoIpDownloaderIT` (elastic#137636)
  Tighten on when THROTTLE decision can be returned (elastic#136794)
  Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeMetricsIT test elastic#137655
  Add a test for two little known conditional processor paths (elastic#137645)
  Extract a common ORIGIN constant (elastic#137612)
  Remove early phase failure in batched (elastic#136889)
  Returning correct index mode from get data streams api (elastic#137646)
  [ML] Manage AD results indices (elastic#136065)
szybia added a commit to szybia/elasticsearch that referenced this pull request Nov 6, 2025
…-json

* upstream/main:
  Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=false} elastic#137691
  Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=true} elastic#137690
  [LTR] Fix feature display order when using explain. (elastic#137671)
  Remove extra RemoteClusterService instances in unit test (elastic#137647)
  Fix `ComponentTemplatesFileSettingsIT.testSettingsApplied` (elastic#137669)
  Consolidates troubleshooting content into the "Returning semantic field embeddings in _source" section (elastic#137233)
  Update bundled JDK to 25.0.1 (elastic#137640)
  resolve indices for prefixed _all expressions (elastic#137330)
  ESQL: Add TopN support for exponential histograms (elastic#137313)
  allows field caps to be cross project (elastic#137530)
  ESQL: Add exponential histogram percentile function (elastic#137553)
  Wait for nodes to have downloaded databases in `GeoIpDownloaderIT` (elastic#137636)
  Tighten on when THROTTLE decision can be returned (elastic#136794)
  Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeMetricsIT test elastic#137655
  Add a test for two little known conditional processor paths (elastic#137645)
  Extract a common ORIGIN constant (elastic#137612)
  Remove early phase failure in batched (elastic#136889)
  Returning correct index mode from get data streams api (elastic#137646)
  [ML] Manage AD results indices (elastic#136065)
Kubik42 pushed a commit to Kubik42/elasticsearch that referenced this pull request Nov 10, 2025
Add a rollover check for the AD results indices to the nightly ML maintenance task.

The results indices will be rolled over when they are of a size equal or greater to 50GB (This value can be adjusted in the cluster config)

The concrete AD reults indices now have a six digit suffix. This is necessary to keep track of rollover behaviour and to determine which index is the "latest" in the series.

We choose not to use ILM to manage the rollover as that is not available for Serverless.
edsavage added a commit that referenced this pull request Nov 14, 2025
Add a daily maintenance task to roll over .ml-state indices if the index size exceeds a configurable default size (default 50GB).

This replaces the previous method of using ILM to manage the state indices, as that was not a workable solution for serverless.

This builds on the work done in PR #136065 which provides similar functionality for results indices.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :ml Machine learning Team:ML Meta label for the ML team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants