-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Description
When customers store anomaly detection results in a shared results index for a long time, the index can grow to a considerable size of 100+GB. This leads to multiple problems:
- Old data can only be deleted using delete-by-query instead of removing the complete outdated index.
- Maintenance tasks timeout, deleting outdated data by query
- Reindexing for a major upgrade can take a very long time
To remedy the situation, we need to roll over the alias on 50GB, so that we end up with a sequence of small indices instead of a single big index.
There are two possible alternatives worth investigating:
- A new maintenance task
- ILM policy similar to the ml state index (see my comment bellow).
Acceptance criteria:
Scenario: Shared results index with multiple jobs rolls over
GIVER: Several anomaly detection jobs run and configured to store results into the default alias .ml-anomalies-shared
WHEN: The backing index reaches the size of 50 GB
THEN: Alias rolls over to a new backing index
All jobs successfully continue to work and write new results
Old results can be renormalized even if they are stored in the old backing index
The single metric viewer and Anomaly Explorer show results for the complete running period.
Scenario: Custom results index with a single job rolls over
GIVER: A single anomaly detection job runs and is configured to store results into the custom index alias .ml-anomalies-custom-job-id
WHEN: The backing index reaches the size of 50 GB
THEN: Alias rolls over to a new backing index
The job continues to work successfully and write new results
Old results can be renormalized even if they are stored in the old backing index
The single metric viewer and Anomaly Explorer show results for the complete running period.
Scenario: Custom results index with multiple jobs pointing to it rolls over
GIVER: Multiple anomaly detection jobs run and are configured to store results into the custom index alias .ml-anomalies-custom-job-id
WHEN: The backing index reaches the size of 50 GB
THEN: Alias rolls over to a new backing index
The jobs successfully continue to work and write new results
Old results can be renormalized even if they are stored in the old backing index
Single metric viewer and Anomaly Explorer show results for the complete running period.