[ML] Manage AD results indices #136065

edsavage · 2025-10-07T01:37:58Z

Add a rollover check for the AD results indices to the nightly ML maintenance task.

The concrete AD results indices now have a six digit suffix. This is necessary to keep track of rollover behaviour and to determine which index is the "latest" in the series.

Depends on #136458

Add a rollover check for the AD results indices to the nightly ML maintenance task. The concrete AD reults indices now have a six digit suffix. This is necessary to keep track of rollover behaviour and to determine which index is the "latest" in the series. WIP

…nage_ad_results_indices

elasticsearchmachine · 2025-10-07T01:38:23Z

Hi @edsavage, I've created a changelog YAML for you.

…manage_ad_results_indices

github-actions · 2025-10-08T03:12:25Z

🔍 Preview links for changed docs

docs/reference/elasticsearch/configuration-reference/machine-learning-settings.md

github-actions · 2025-10-08T03:12:26Z

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Check out the cumulative docs guidelines
Reach out in the #docs Slack channel

davidkyle · 2025-10-21T13:42:57Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/job/config/Job.java

+            } else if ((resultsIndexName.startsWith(AnomalyDetectorsIndexFields.RESULTS_INDEX_SHARED)
+                && MlIndexAndAlias.has6DigitSuffix(resultsIndexName)
+                && resultsIndexName.length() == AnomalyDetectorsIndexFields.RESULTS_INDEX_DEFAULT.length()) == false) {


If the name starts with shared and has a 6 digit suffix NNNNNN and is the same length as shared-000001 then don't enter the if block. It looks like we want to test if resultsIndexName == shared-000001 but we would also accept another character instead of the - e.g. shared+00000`.

davidkyle · 2025-10-21T13:46:07Z

docs/reference/elasticsearch/configuration-reference/machine-learning-settings.md

 `xpack.ml.nightly_maintenance_requests_per_second`
 :   ([Dynamic](docs-content://deploy-manage/stack-settings.md#dynamic-cluster-setting)) The rate at which the nightly maintenance task deletes expired model snapshots and results. The setting is a proxy to the [`requests_per_second`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-delete-by-query) parameter used in the delete by query requests and controls throttling. When the {{operator-feature}} is enabled, this setting can be updated only by operator users. Valid values must be greater than `0.0` or equal to `-1.0`, where `-1.0` means a default value is used. Defaults to `-1.0`

+`xpack.ml.nightly_maintenance_rollover_max_size`


Suggested change

`xpack.ml.nightly_maintenance_rollover_max_size`

`xpack.ml.results_index_rollover_max_size`

davidkyle · 2025-10-21T14:00:35Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/job/config/Job.java

+                    resultsIndexName = resultsIndexName.startsWith("custom-") ? resultsIndexName : "custom-" + resultsIndexName;
+                }
+
+            resultsIndexName = MlIndexAndAlias.has6DigitSuffix(resultsIndexName) ? resultsIndexName : resultsIndexName + "-000001";


resultsIndexName is part of the job configuration returned in GET _ml/anomaly_detectors. If we are prescriptive about the index version (-000001) then when the index is rolled over that name is out of date.

Should resultsIndexName just be the root part without the 6 digit suffix and make sure the aliases point to the right index?

The results index is created in JobResultsProvider. Would it make more sense to do the logic to figure out the latest index name there

elasticsearch/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/job/persistence/JobResultsProvider.java

Line 304 in 6b62bd8

public void createJobResultIndex(Job job, ClusterState state, final ActionListener<Boolean> finalListener) {

This build() method is called both when creating the job and when it is parsed from the stored document. For that reason I think it would be safer to move the results index logic to where the index is created.

…nage_ad_results_indices

…manage_ad_results_indices

…nage_ad_results_indices

valeriy42

It LGTM in general. Great work on refactoring, removing code duplication and making your code easy to read. 🚀

My main concern is about the configuration parameter nightly_maintenance_rollover_max_size.

Does "0B" mean effectively that all result indices will be rolled over every time the nightly maintenance task runs, aka every night?
Is there a way to turn off the rollover now? Do we want the user to be able to turn off the rollover?

WDYT?

valeriy42 · 2025-10-29T08:56:54Z

docs/reference/elasticsearch/configuration-reference/machine-learning-settings.md

-`xpack.ml.nightly_maintenance_rollover_max_size`
-:   ([Dynamic](docs-content://deploy-manage/stack-settings.md#dynamic-cluster-setting)) The maximum size the anomaly detection results indices can reach before being rolled over by the nightly maintenance task. When the {{operator-feature}} is enabled, this setting can be updated only by operator users. Valid values must be greater than `0B` or equal to `-1B`. Defaults to `50GB`.
+`xpack.ml.results_index_rollover_max_size`
+:   ([Dynamic](docs-content://deploy-manage/stack-settings.md#dynamic-cluster-setting)) The maximum size the anomaly detection results indices can reach before being rolled over by the nightly maintenance task. When the {{operator-feature}} is enabled, this setting can be updated only by operator users. Valid values must be greater than or equal to `0B`. A value of `0B` means the indices will always be rolled over. Defaults to `50GB`.


the indices will always be rolled over

What does this mean? And how can you turn off the rollover?

What does this mean? And how can you turn off the rollover?

It means that regardless of size of the results indices, they will always be rolled over. I can update the docs to be more specific about this and/or we can choose a more suitable minimum value for xpack.ml.results_index_rollover_max_size

Currently no, rollover can't be turned off. I think it should be able to be though, I'll add that change.

valeriy42 · 2025-10-29T09:09:52Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/MlDailyMaintenanceService.java

-    private static final Logger logger = LogManager.getLogger(MlDailyMaintenanceService.class);
+    private static final org.elasticsearch.logging.Logger logger = LogManager.getLogger(MlDailyMaintenanceService.class);

    private static final int MAX_TIME_OFFSET_MINUTES = 120;


There was a misunderstanding there. This comment is to document MAX_TIME_OFFSET_MINUTES. You don't need to document the logger.

edsavage · 2025-10-29T20:21:21Z

Does "0B" mean effectively that all result indices will be rolled over every time the nightly maintenance task runs, aka every night?

Yes, if the user changed the default max rollover size value from 50GB to 0B that would cause the results indices to be rolled over regardless of size.

edsavage · 2025-10-29T20:27:35Z

Is there a way to turn off the rollover now? Do we want the user to be able to turn off the rollover?

Currently no, there is no way of turning off the rollover - other than setting the max rollover size to some huge value.

We probably should give the users some flexibility in being able to turn off rollover all together. Maybe a special value of -1B for the max rollover size would be sufficient for this purpose?

* Added the ability to set results_index_rollover_max_size to -1B to configure the nightly maintenance task not to trigger indices rollover

…nage_ad_results_indices

Add a daily maintenance task to roll over .ml-state indices if the index size exceeds a configurable default size (default 50GB). This replaces the previous method of using ILM to manage the state indices, as that was not a workable solution for serverless. This builds on the work done in PR elastic#136065 which provides similar functionality for results indices.

valeriy42

LGTM. Good work!

Add a daily maintenance task to roll over .ml-state indices if the index size exceeds a configurable default size (default 50GB). This replaces the previous method of using ILM to manage the state indices, as that was not a workable solution for serverless. This builds on the work done in PR elastic#136065 which provides similar functionality for results indices. WIP

Add a rollover check for the AD results indices to the nightly ML maintenance task. The results indices will be rolled over when they are of a size equal or greater to 50GB (This value can be adjusted in the cluster config) The concrete AD reults indices now have a six digit suffix. This is necessary to keep track of rollover behaviour and to determine which index is the "latest" in the series. We choose not to use ILM to manage the rollover as that is not available for Serverless.

…-json * upstream/main: Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=false} elastic#137691 Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=true} elastic#137690 [LTR] Fix feature display order when using explain. (elastic#137671) Remove extra RemoteClusterService instances in unit test (elastic#137647) Fix `ComponentTemplatesFileSettingsIT.testSettingsApplied` (elastic#137669) Consolidates troubleshooting content into the "Returning semantic field embeddings in _source" section (elastic#137233) Update bundled JDK to 25.0.1 (elastic#137640) resolve indices for prefixed _all expressions (elastic#137330) ESQL: Add TopN support for exponential histograms (elastic#137313) allows field caps to be cross project (elastic#137530) ESQL: Add exponential histogram percentile function (elastic#137553) Wait for nodes to have downloaded databases in `GeoIpDownloaderIT` (elastic#137636) Tighten on when THROTTLE decision can be returned (elastic#136794) Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeMetricsIT test elastic#137655 Add a test for two little known conditional processor paths (elastic#137645) Extract a common ORIGIN constant (elastic#137612) Remove early phase failure in batched (elastic#136889) Returning correct index mode from get data streams api (elastic#137646) [ML] Manage AD results indices (elastic#136065)

Add a rollover check for the AD results indices to the nightly ML maintenance task. The results indices will be rolled over when they are of a size equal or greater to 50GB (This value can be adjusted in the cluster config) The concrete AD reults indices now have a six digit suffix. This is necessary to keep track of rollover behaviour and to determine which index is the "latest" in the series. We choose not to use ILM to manage the rollover as that is not available for Serverless.

Add a daily maintenance task to roll over .ml-state indices if the index size exceeds a configurable default size (default 50GB). This replaces the previous method of using ILM to manage the state indices, as that was not a workable solution for serverless. This builds on the work done in PR #136065 which provides similar functionality for results indices.

edsavage added 9 commits October 6, 2025 15:20

Merge branch 'main' of github.com:elasticsearch/elasticsearch into ma…

78d0d8e

…nage_ad_results_indices

Spotless Apply

0ec99b6

Spotless Apply

8405ff3

Bit of a tidy up

d10d09d

Slight refactor

dfb1939

Another tidy

ffc2842

Merge branch 'main' of github.com:elasticsearch/elasticsearch into ma…

9863169

…nage_ad_results_indices

Remove unused accessor

8f17540

edsavage added >enhancement :ml Machine learning v9.3.0 labels Oct 7, 2025

Update docs/changelog/136065.yaml

d60a811

edsavage and others added 8 commits October 7, 2025 14:38

Merge branch 'main' into manage_ad_results_indices

2729a86

[CI] Auto commit changes from spotless

c5f58e1

Address some test failures

7b6caf0

Merge remote-tracking branch 'origin/manage_ad_results_indices' into …

52ab642

…manage_ad_results_indices

Typos

66d7268

[CI] Auto commit changes from spotless

f9296fd

Make the max results index size for rollover user configurable.

07ddbaa

Merge remote-tracking branch 'origin/manage_ad_results_indices' into …

e7eb106

…manage_ad_results_indices

github-actions bot deployed to docs-preview October 8, 2025 03:11 View deployment

Fix bad merge

cb85a49

github-actions bot deployed to docs-preview October 8, 2025 03:21 View deployment

[CI] Auto commit changes from spotless

123c8cb

github-actions bot deployed to docs-preview October 8, 2025 03:30 View deployment

Merge branch 'main' into manage_ad_results_indices

1a5d0da

davidkyle reviewed Oct 21, 2025

View reviewed changes

Attend to code review comments

326eedd

edsavage requested review from davidkyle and valeriy42 October 23, 2025 03:39

edsavage and others added 7 commits October 23, 2025 16:43

Remove unneeded variable

21ea400

[CI] Auto commit changes from spotless

f90e4b8

Merge branch 'main' of github.com:elasticsearch/elasticsearch into ma…

5d06353

…nage_ad_results_indices

Merge remote-tracking branch 'origin/manage_ad_results_indices' into …

988eeeb

…manage_ad_results_indices

Bugfix and typo

000e1b3

Merge branch 'main' of github.com:elasticsearch/elasticsearch into ma…

cc157c8

…nage_ad_results_indices

More tests and fixes

58540ae

valeriy42 reviewed Oct 29, 2025

View reviewed changes

edsavage added 2 commits October 30, 2025 10:47

* Clarified documentation regarding results_index_rollover_max_size

74d92ae

* Added the ability to set results_index_rollover_max_size to -1B to configure the nightly maintenance task not to trigger indices rollover

Merge branch 'main' of github.com:elasticsearch/elasticsearch into ma…

0c2fbe9

…nage_ad_results_indices

edsavage requested a review from valeriy42 October 30, 2025 03:32

edsavage mentioned this pull request Nov 5, 2025

[ML] Add daily task to manage .ml-state indices #137603

Closed

valeriy42 approved these changes Nov 5, 2025

View reviewed changes

edsavage merged commit d554d15 into elastic:main Nov 5, 2025
34 checks passed

edsavage mentioned this pull request Nov 5, 2025

[ML] Add daily task to manage .ml-state indices #137653

Merged

edsavage mentioned this pull request Nov 24, 2025

[ML] Roll-over .ml-anomalies-* indices when they are at 50GB #131014

Closed

	`xpack.ml.nightly_maintenance_rollover_max_size`
	`xpack.ml.results_index_rollover_max_size`

[ML] Manage AD results indices #136065

[ML] Manage AD results indices #136065

Uh oh!

Conversation

edsavage commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Oct 7, 2025

Uh oh!

github-actions bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

github-actions bot commented Oct 8, 2025

ℹ️ Important: Docs version tagging

When to use applies_to tags:

What NOT to do:

🤔 Need help?

Uh oh!

davidkyle Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

edsavage Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

davidkyle Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

davidkyle Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

valeriy42 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

valeriy42 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

edsavage Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

valeriy42 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

edsavage commented Oct 29, 2025

Uh oh!

edsavage commented Oct 29, 2025

Uh oh!

valeriy42 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

edsavage commented Oct 7, 2025 •

edited

Loading

github-actions bot commented Oct 8, 2025 •

edited

Loading

valeriy42 left a comment •

edited

Loading