Skip to content

[ML] Roll-over .ml-anomalies-* indices when they are at 50GB #131014

@valeriy42

Description

@valeriy42

When customers store anomaly detection results in a shared results index for a long time, the index can grow to a considerable size of 100+GB. This leads to multiple problems:

  • Old data can only be deleted using delete-by-query instead of removing the complete outdated index.
  • Maintenance tasks timeout, deleting outdated data by query
  • Reindexing for a major upgrade can take a very long time

To remedy the situation, we need to roll over the alias on 50GB, so that we end up with a sequence of small indices instead of a single big index.

There are two possible alternatives worth investigating:

  • A new maintenance task
  • ILM policy similar to the ml state index (see my comment bellow).

Acceptance criteria:

Scenario: Shared results index with multiple jobs rolls over

GIVER: Several anomaly detection jobs run and configured to store results into the default alias .ml-anomalies-shared
WHEN: The backing index reaches the size of 50 GB
THEN: Alias rolls over to a new backing index
All jobs successfully continue to work and write new results
Old results can be renormalized even if they are stored in the old backing index
The single metric viewer and Anomaly Explorer show results for the complete running period.

Scenario: Custom results index with a single job rolls over

GIVER: A single anomaly detection job runs and is configured to store results into the custom index alias .ml-anomalies-custom-job-id
WHEN: The backing index reaches the size of 50 GB
THEN: Alias rolls over to a new backing index
The job continues to work successfully and write new results
Old results can be renormalized even if they are stored in the old backing index
The single metric viewer and Anomaly Explorer show results for the complete running period.

Scenario: Custom results index with multiple jobs pointing to it rolls over

GIVER: Multiple anomaly detection jobs run and are configured to store results into the custom index alias .ml-anomalies-custom-job-id
WHEN: The backing index reaches the size of 50 GB
THEN: Alias rolls over to a new backing index
The jobs successfully continue to work and write new results
Old results can be renormalized even if they are stored in the old backing index
Single metric viewer and Anomaly Explorer show results for the complete running period.

Metadata

Metadata

Assignees

Labels

:mlMachine learningTeam:MLMeta label for the ML team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions