-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[ML] Manage AD results indices #136065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Manage AD results indices #136065
Conversation
Add a rollover check for the AD results indices to the nightly ML maintenance task. The concrete AD reults indices now have a six digit suffix. This is necessary to keep track of rollover behaviour and to determine which index is the "latest" in the series. WIP
…nage_ad_results_indices
…nage_ad_results_indices
|
Hi @edsavage, I've created a changelog YAML for you. |
…manage_ad_results_indices
…manage_ad_results_indices
🔍 Preview links for changed docs |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
| } else if ((resultsIndexName.startsWith(AnomalyDetectorsIndexFields.RESULTS_INDEX_SHARED) | ||
| && MlIndexAndAlias.has6DigitSuffix(resultsIndexName) | ||
| && resultsIndexName.length() == AnomalyDetectorsIndexFields.RESULTS_INDEX_DEFAULT.length()) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the name starts with shared and has a 6 digit suffix NNNNNN and is the same length as shared-000001 then don't enter the if block. It looks like we want to test if resultsIndexName == shared-000001 but we would also accept another character instead of the - e.g. shared+00000`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot!
| `xpack.ml.nightly_maintenance_requests_per_second` | ||
| : ([Dynamic](docs-content://deploy-manage/stack-settings.md#dynamic-cluster-setting)) The rate at which the nightly maintenance task deletes expired model snapshots and results. The setting is a proxy to the [`requests_per_second`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-delete-by-query) parameter used in the delete by query requests and controls throttling. When the {{operator-feature}} is enabled, this setting can be updated only by operator users. Valid values must be greater than `0.0` or equal to `-1.0`, where `-1.0` means a default value is used. Defaults to `-1.0` | ||
|
|
||
| `xpack.ml.nightly_maintenance_rollover_max_size` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `xpack.ml.nightly_maintenance_rollover_max_size` | |
| `xpack.ml.results_index_rollover_max_size` |
| resultsIndexName = resultsIndexName.startsWith("custom-") ? resultsIndexName : "custom-" + resultsIndexName; | ||
| } | ||
|
|
||
| resultsIndexName = MlIndexAndAlias.has6DigitSuffix(resultsIndexName) ? resultsIndexName : resultsIndexName + "-000001"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resultsIndexName is part of the job configuration returned in GET _ml/anomaly_detectors. If we are prescriptive about the index version (-000001) then when the index is rolled over that name is out of date.
Should resultsIndexName just be the root part without the 6 digit suffix and make sure the aliases point to the right index?
The results index is created in JobResultsProvider. Would it make more sense to do the logic to figure out the latest index name there
Line 304 in 6b62bd8
| public void createJobResultIndex(Job job, ClusterState state, final ActionListener<Boolean> finalListener) { |
This build() method is called both when creating the job and when it is parsed from the stored document. For that reason I think it would be safer to move the results index logic to where the index is created.
…nage_ad_results_indices
…manage_ad_results_indices
…nage_ad_results_indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It LGTM in general. Great work on refactoring, removing code duplication and making your code easy to read. 🚀
My main concern is about the configuration parameter nightly_maintenance_rollover_max_size.
- Does "0B" mean effectively that all result indices will be rolled over every time the nightly maintenance task runs, aka every night?
- Is there a way to turn off the rollover now? Do we want the user to be able to turn off the rollover?
WDYT?
| `xpack.ml.nightly_maintenance_rollover_max_size` | ||
| : ([Dynamic](docs-content://deploy-manage/stack-settings.md#dynamic-cluster-setting)) The maximum size the anomaly detection results indices can reach before being rolled over by the nightly maintenance task. When the {{operator-feature}} is enabled, this setting can be updated only by operator users. Valid values must be greater than `0B` or equal to `-1B`. Defaults to `50GB`. | ||
| `xpack.ml.results_index_rollover_max_size` | ||
| : ([Dynamic](docs-content://deploy-manage/stack-settings.md#dynamic-cluster-setting)) The maximum size the anomaly detection results indices can reach before being rolled over by the nightly maintenance task. When the {{operator-feature}} is enabled, this setting can be updated only by operator users. Valid values must be greater than or equal to `0B`. A value of `0B` means the indices will always be rolled over. Defaults to `50GB`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the indices will always be rolled over
What does this mean? And how can you turn off the rollover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean? And how can you turn off the rollover?
It means that regardless of size of the results indices, they will always be rolled over. I can update the docs to be more specific about this and/or we can choose a more suitable minimum value for xpack.ml.results_index_rollover_max_size
Currently no, rollover can't be turned off. I think it should be able to be though, I'll add that change.
| private static final Logger logger = LogManager.getLogger(MlDailyMaintenanceService.class); | ||
| private static final org.elasticsearch.logging.Logger logger = LogManager.getLogger(MlDailyMaintenanceService.class); | ||
|
|
||
| private static final int MAX_TIME_OFFSET_MINUTES = 120; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a misunderstanding there. This comment is to document MAX_TIME_OFFSET_MINUTES. You don't need to document the logger.
Yes, if the user changed the default max rollover size value from |
Currently no, there is no way of turning off the rollover - other than setting the max rollover size to some huge value. We probably should give the users some flexibility in being able to turn off rollover all together. Maybe a special value of |
* Added the ability to set results_index_rollover_max_size to -1B to configure the nightly maintenance task not to trigger indices rollover
…nage_ad_results_indices
Add a daily maintenance task to roll over .ml-state indices if the index size exceeds a configurable default size (default 50GB). This replaces the previous method of using ILM to manage the state indices, as that was not a workable solution for serverless. This builds on the work done in PR elastic#136065 which provides similar functionality for results indices.
valeriy42
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Good work!
Add a daily maintenance task to roll over .ml-state indices if the index size exceeds a configurable default size (default 50GB). This replaces the previous method of using ILM to manage the state indices, as that was not a workable solution for serverless. This builds on the work done in PR elastic#136065 which provides similar functionality for results indices. WIP
Add a daily maintenance task to roll over .ml-state indices if the index size exceeds a configurable default size (default 50GB). This replaces the previous method of using ILM to manage the state indices, as that was not a workable solution for serverless. This builds on the work done in PR elastic#136065 which provides similar functionality for results indices. WIP
Add a rollover check for the AD results indices to the nightly ML maintenance task. The results indices will be rolled over when they are of a size equal or greater to 50GB (This value can be adjusted in the cluster config) The concrete AD reults indices now have a six digit suffix. This is necessary to keep track of rollover behaviour and to determine which index is the "latest" in the series. We choose not to use ILM to manage the rollover as that is not available for Serverless.
…-json
* upstream/main:
Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=false} elastic#137691
Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=true} elastic#137690
[LTR] Fix feature display order when using explain. (elastic#137671)
Remove extra RemoteClusterService instances in unit test (elastic#137647)
Fix `ComponentTemplatesFileSettingsIT.testSettingsApplied` (elastic#137669)
Consolidates troubleshooting content into the "Returning semantic field embeddings in _source" section (elastic#137233)
Update bundled JDK to 25.0.1 (elastic#137640)
resolve indices for prefixed _all expressions (elastic#137330)
ESQL: Add TopN support for exponential histograms (elastic#137313)
allows field caps to be cross project (elastic#137530)
ESQL: Add exponential histogram percentile function (elastic#137553)
Wait for nodes to have downloaded databases in `GeoIpDownloaderIT` (elastic#137636)
Tighten on when THROTTLE decision can be returned (elastic#136794)
Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeMetricsIT test elastic#137655
Add a test for two little known conditional processor paths (elastic#137645)
Extract a common ORIGIN constant (elastic#137612)
Remove early phase failure in batched (elastic#136889)
Returning correct index mode from get data streams api (elastic#137646)
[ML] Manage AD results indices (elastic#136065)
…-json
* upstream/main:
Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=false} elastic#137691
Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=true} elastic#137690
[LTR] Fix feature display order when using explain. (elastic#137671)
Remove extra RemoteClusterService instances in unit test (elastic#137647)
Fix `ComponentTemplatesFileSettingsIT.testSettingsApplied` (elastic#137669)
Consolidates troubleshooting content into the "Returning semantic field embeddings in _source" section (elastic#137233)
Update bundled JDK to 25.0.1 (elastic#137640)
resolve indices for prefixed _all expressions (elastic#137330)
ESQL: Add TopN support for exponential histograms (elastic#137313)
allows field caps to be cross project (elastic#137530)
ESQL: Add exponential histogram percentile function (elastic#137553)
Wait for nodes to have downloaded databases in `GeoIpDownloaderIT` (elastic#137636)
Tighten on when THROTTLE decision can be returned (elastic#136794)
Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeMetricsIT test elastic#137655
Add a test for two little known conditional processor paths (elastic#137645)
Extract a common ORIGIN constant (elastic#137612)
Remove early phase failure in batched (elastic#136889)
Returning correct index mode from get data streams api (elastic#137646)
[ML] Manage AD results indices (elastic#136065)
Add a rollover check for the AD results indices to the nightly ML maintenance task. The results indices will be rolled over when they are of a size equal or greater to 50GB (This value can be adjusted in the cluster config) The concrete AD reults indices now have a six digit suffix. This is necessary to keep track of rollover behaviour and to determine which index is the "latest" in the series. We choose not to use ILM to manage the rollover as that is not available for Serverless.
Add a daily maintenance task to roll over .ml-state indices if the index size exceeds a configurable default size (default 50GB). This replaces the previous method of using ILM to manage the state indices, as that was not a workable solution for serverless. This builds on the work done in PR #136065 which provides similar functionality for results indices.
Add a rollover check for the AD results indices to the nightly ML maintenance task.
The concrete AD results indices now have a six digit suffix. This is necessary to keep track of rollover behaviour and to determine which index is the "latest" in the series.
Depends on #136458