Skip to content

Commit 6329ba3

Browse files
authored
[E&A] Refine anomaly detection docs set part 1. (#304)
1 parent e4cf9ac commit 6329ba3

26 files changed

+115
-360
lines changed

explore-analyze/machine-learning.md

Lines changed: 7 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -8,27 +8,21 @@ mapped_urls:
88

99
# What is Elastic Machine Learning? [machine-learning-intro]
1010

11-
{{ml-cap}} features analyze your data and generate models for its patterns of behavior.
12-
The type of analysis that you choose depends on the questions or problems you want to address and the type of data you have available.
11+
{{ml-cap}} features analyze your data and generate models for its patterns of behavior. The type of analysis that you choose depends on the questions or problems you want to address and the type of data you have available.
1312

1413
## Unsupervised {{ml}} [machine-learning-unsupervised]
1514

1615
There are two types of analysis that can deduce the patterns and relationships within your data without training or intervention: *{{anomaly-detect}}* and *{{oldetection}}*.
1716

18-
[{{anomaly-detect-cap}}](machine-learning/anomaly-detection.md) requires time series data.
19-
It constructs a probability model and can run continuously to identify unusual events as they occur. The model evolves over time; you can use its insights to forecast future behavior.
17+
[{{anomaly-detect-cap}}](machine-learning/anomaly-detection.md) requires time series data. It constructs a probability model and can run continuously to identify unusual events as they occur. The model evolves over time; you can use its insights to forecast future behavior.
2018

21-
[{{oldetection-cap}}](machine-learning/data-frame-analytics/ml-dfa-finding-outliers.md) does not require time series data.
22-
It is a type of {{dfanalytics}} that identifies unusual points in a data set by analyzing how close each data point is to others and the density of the cluster of points around it.
23-
It does not run continuously; it generates a copy of your data set where each data point is annotated with an {{olscore}}.
24-
The score indicates the extent to which a data point is an outlier compared to other data points.
19+
[{{oldetection-cap}}](machine-learning/data-frame-analytics/ml-dfa-finding-outliers.md) does not require time series data. It is a type of {{dfanalytics}} that identifies unusual points in a data set by analyzing how close each data point is to others and the density of the cluster of points around it. It does not run continuously; it generates a copy of your data set where each data point is annotated with an {{olscore}}. The score indicates the extent to which a data point is an outlier compared to other data points.
2520

2621
## Supervised {{ml}} [machine-learning-supervised]
2722

2823
There are two types of {{dfanalytics}} that require training data sets: *{{classification}}* and *{{regression}}*.
2924

30-
In both cases, the result is a copy of your data set where each data point is annotated with predictions and a trained model, which you can deploy to make predictions for new data.
31-
For more information, refer to [Introduction to supervised learning](machine-learning/data-frame-analytics/ml-dfa-overview.md#ml-supervised-workflow).
25+
In both cases, the result is a copy of your data set where each data point is annotated with predictions and a trained model, which you can deploy to make predictions for new data. For more information, refer to [Introduction to supervised learning](machine-learning/data-frame-analytics/ml-dfa-overview.md#ml-supervised-workflow).
3226

3327
[{{classification-cap}}](machine-learning/data-frame-analytics/ml-dfa-classification.md) learns relationships between your data points in order to predict discrete categorical values, such as whether a DNS request originates from a malicious or benign domain.
3428

@@ -44,18 +38,13 @@ The {{ml-features}} that are available vary by project type:
4438

4539
## Synchronize saved objects [machine-learning-synchronize-saved-objects]
4640

47-
Before you can view your {{ml}} {dfeeds}, jobs, and trained models in {{kib}}, they must have saved objects.
48-
For example, if you used APIs to create your jobs, wait for automatic synchronization or go to the **{{ml-app}}** page and click **Synchronize saved objects**.
41+
Before you can view your {{ml}} {dfeeds}, jobs, and trained models in {{kib}}, they must have saved objects. For example, if you used APIs to create your jobs, wait for automatic synchronization or go to the **{{ml-app}}** page and click **Synchronize saved objects**.
4942

5043
## Export and import jobs [machine-learning-export-and-import-jobs]
5144

52-
You can export and import your {{ml}} job and {{dfeed}} configuration details on the **{{ml-app}}** page.
53-
For example, you can export jobs from your test environment and import them in your production environment.
45+
You can export and import your {{ml}} job and {{dfeed}} configuration details on the **{{ml-app}}** page. For example, you can export jobs from your test environment and import them in your production environment.
5446

55-
The exported file contains configuration details; it does not contain the {{ml}} models.
56-
For {{anomaly-detect}}, you must import and run the job to build a model that is accurate for the new environment.
57-
For {{dfanalytics}}, trained models are portable; you can import the job then transfer the model to the new cluster.
58-
Refer to [Exporting and importing {{dfanalytics}} trained models](machine-learning/data-frame-analytics/ml-trained-models.md#export-import).
47+
The exported file contains configuration details; it does not contain the {{ml}} models. For {{anomaly-detect}}, you must import and run the job to build a model that is accurate for the new environment. For {{dfanalytics}}, trained models are portable; you can import the job then transfer the model to the new cluster. Refer to [Exporting and importing {{dfanalytics}} trained models](machine-learning/data-frame-analytics/ml-trained-models.md#export-import).
5948

6049
There are some additional actions that you must take before you can successfully import and run your jobs:
6150

explore-analyze/machine-learning/anomaly-detection.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,13 @@ mapped_urls:
44
- https://www.elastic.co/guide/en/kibana/current/xpack-ml-anomalies.html
55
---
66

7-
# Anomaly detection
7+
# Anomaly detection [ml-ad-overview]
88

9-
% What needs to be done: Align serverless/stateful
9+
You can use {{stack}} {{ml-features}} to analyze time series data and identify anomalous patterns in your data set.
1010

11-
% Scope notes: Colleen McGinnis removed "https://www.elastic.co/guide/en/serverless/current/observability-machine-learning.html" and "All children" because this page is also used below in "AIOps Labs" with "All children" selected. We can't copy all children to two places.
12-
13-
% Use migrated content from existing pages that map to this page:
14-
15-
% - [ ] ./raw-migrated-files/stack-docs/machine-learning/ml-ad-overview.md
16-
% - [ ] ./raw-migrated-files/kibana/kibana/xpack-ml-anomalies.md
11+
* [Finding anomalies](../../../explore-analyze/machine-learning/anomaly-detection/ml-ad-finding-anomalies.md)
12+
* [Tutorial: Getting started with {{anomaly-detect}}](../../../explore-analyze/machine-learning/anomaly-detection/ml-getting-started.md)
13+
* [*Advanced concepts*](../../../explore-analyze/machine-learning/anomaly-detection/ml-ad-concepts.md)
14+
* [*API quick reference*](../../../explore-analyze/machine-learning/anomaly-detection/ml-api-quickref.md)
15+
* [How-tos](../../../explore-analyze/machine-learning/anomaly-detection/anomaly-how-tos.md)
16+
* [*Resources*](../../../explore-analyze/machine-learning/anomaly-detection/ml-ad-resources.md)

explore-analyze/machine-learning/anomaly-detection/anomaly-detection-scale.md

Lines changed: 15 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,7 @@ Prerequisites:
1717

1818
The following recommendations are not sequential – the numbers just help to navigate between the list items; you can take action on one or more of them in any order. You can implement some of these changes on existing jobs; others require you to clone an existing job or create a new one.
1919

20-
21-
## 1. Consider autoscaling, node sizing, and configuration [node-sizing]
20+
## 1. Consider autoscaling, node sizing, and configuration [node-sizing]
2221

2322
An {{anomaly-job}} runs on a single node and requires sufficient resources to hold its model in memory. When a job is opened, it will be placed on the node with the most available memory at that time.
2423

@@ -32,20 +31,17 @@ Increasing the number of nodes will allow distribution of job processing as well
3231

3332
In {{ecloud}}, you can enable [autoscaling](../../../deploy-manage/autoscaling.md) so that the {{ml}} nodes in your cluster scale up or down based on current {{ml}} memory and CPU requirements. The {{ecloud}} infrastructure allows you to create {{ml-jobs}} up to the size that fits on the maximum node size that the cluster can scale to (usually somewhere between 58GB and 64GB) rather than what would fit in the current cluster. If you attempt to use autoscaling outside of {{ecloud}}, then set `xpack.ml.max_ml_node_size` to define the maximum possible size of a {{ml}} node. Creating {{ml-jobs}} with model memory limits larger than the maximum node size can support is not allowed, as autoscaling cannot add a node big enough to run the job. On a self-managed deployment, you can set `xpack.ml.max_model_memory_limit` according to the available resources of the {{ml}} node. This prevents you from creating jobs with model memory limits too high to open in your cluster.
3433

35-
36-
## 2. Use dedicated results indices [dedicated-results-index]
34+
## 2. Use dedicated results indices [dedicated-results-index]
3735

3836
For large jobs, use a dedicated results index. This ensures that results from a single large job do not dominate the shared results index. It also ensures that the job and results (if `results_retention_days` is set) can be deleted more efficiently and improves renormalization performance. By default, {{anomaly-job}} results are stored in a shared index. To change to use a dedicated result index, you need to clone or create a new job.
3937

40-
41-
## 3. Disable model plot [model-plot]
38+
## 3. Disable model plot [model-plot]
4239

4340
By default, model plot is enabled when you create jobs in {{kib}}. If you have a large job, however, consider disabling it. You can disable model plot for existing jobs by using the [Update {{anomaly-jobs}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-update-job.html).
4441

4542
Model plot calculates and stores the model bounds for each analyzed entity, including both anomalous and non-anomalous entities. These bounds are used to display the shaded area in the Single Metric Viewer charts. Model plot creates one result document per bucket per split field value. If you have high cardinality fields and/or a short bucket span, disabling model plot reduces processing workload and results stored.
4643

47-
48-
## 4. Understand how detector configuration can impact model memory [detector-configuration]
44+
## 4. Understand how detector configuration can impact model memory [detector-configuration]
4945

5046
The following factors are most significant in increasing the memory required for a job:
5147

@@ -59,36 +55,31 @@ If you have high cardinality `by` or `partition` fields, ensure you have suffici
5955

6056
To change partitioning fields, influencers and/or detectors, you need to clone or create a new job.
6157

62-
63-
## 5. Optimize the bucket span [optimize-bucket-span]
58+
## 5. Optimize the bucket span [optimize-bucket-span]
6459

6560
Short bucket spans and high cardinality detectors are resource intensive and require more system resources.
6661

6762
Bucket span is typically between 15m and 1h. The recommended value always depends on the data, the use case, and the latency required for alerting. A job with a longer bucket span uses less resources because fewer buckets require processing and fewer results are written. Bucket spans that are sensible dividers of an hour or day work best as most periodic patterns have a daily cycle.
6863

6964
If your use case is suitable, consider increasing the bucket span to reduce processing workload. To change the bucket span, you need to clone or create a new job.
7065

71-
72-
## 6. Set the `scroll_size` of the {{dfeed}} [set-scroll-size]
66+
## 6. Set the `scroll_size` of the {{dfeed}} [set-scroll-size]
7367

7468
This consideration only applies to {{dfeeds}} that **do not** use aggregations. The `scroll_size` parameter of a {{dfeed}} specifies the number of hits to return from {{es}} searches. The higher the `scroll_size` the more results are returned by a single search. When your {{anomaly-job}} has a high throughput, increasing `scroll_size` may decrease the time the job needs to analyze incoming data, however may also increase the pressure on your cluster. You cannot increase `scroll_size` to more than the value of `index.max_result_window` which is 10,000 by default. If you update the settings of a {{dfeed}}, you must stop and start the {{dfeed}} for the change to be applied.
7569

76-
77-
## 7. Set the model memory limit [set-model-memory-limit]
70+
## 7. Set the model memory limit [set-model-memory-limit]
7871

7972
The `model_memory_limit` job configuration option sets the approximate maximum amount of memory resources required for analytical processing. When you create an {{anomaly-job}} in {{kib}}, it provides an estimate for this limit. The estimate is based on the analysis configuration details for the job and cardinality estimates, which are derived by running aggregations on the source indices as they exist at that specific point in time.
8073

8174
If you change the resources available on your {{ml}} nodes or make significant changes to the characteristics or cardinality of your data, the model memory requirements might also change. You can update the model memory limit for a job while it is closed. If you want to decrease the limit below the current model memory usage, however, you must clone and re-run the job.
8275

83-
::::{tip}
76+
::::{tip}
8477
You can view the current model size statistics with the [get {{anomaly-job}} stats](https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-get-job-stats.html) and [get model snapshots](https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-get-snapshot.html) APIs. You can also obtain a model memory limit estimate at any time by running the [estimate {{anomaly-jobs}} model memory API](https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-estimate-model-memory.html). However, you must provide your own cardinality estimates.
8578
::::
8679

87-
8880
As a job approaches its model memory limit, the memory status is `soft_limit` and older models are more aggressively pruned to free up space. If you have categorization jobs, no further examples are stored. When a job exceeds its limit, the memory status is `hard_limit` and the job no longer models new entities. It is therefore important to have appropriate memory model limits for each job. If you reach the hard limit and are concerned about the missing data, ensure that you have adequate resources then clone and re-run the job with a larger model memory limit.
8981

90-
91-
## 8. Pre-aggregate your data [pre-aggregate-data]
82+
## 8. Pre-aggregate your data [pre-aggregate-data]
9283

9384
You can speed up the analysis by summarizing your data with aggregations.
9485

@@ -100,22 +91,19 @@ In certain cases, you cannot do aggregations to increase performance. For exampl
10091

10192
Please consult [Aggregating data for faster performance](ml-configuring-aggregation.md) to learn more.
10293

103-
104-
## 9. Optimize the results retention [results-retention]
94+
## 9. Optimize the results retention [results-retention]
10595

10696
Set a results retention window to reduce the amount of results stored.
10797

10898
{{anomaly-detect-cap}} results are retained indefinitely by default. Results build up over time, and your result index may be quite large. A large results index is slow to query and takes up significant space on your cluster. Consider how long you wish to retain the results and set `results_retention_days` accordingly – for example, to 30 or 60 days – to avoid unnecessarily large result indices. Deleting old results does not affect the model behavior. You can change this setting for existing jobs.
10999

110-
111-
## 10. Optimize the renormalization window [renormalization-window]
100+
## 10. Optimize the renormalization window [renormalization-window]
112101

113102
Reduce the renormalization window to reduce processing workload.
114103

115104
When a new anomaly has a much higher score than any anomaly in the past, the anomaly scores are adjusted on a range from 0 to 100 based on the new data. This is called renormalization. It can mean rewriting a large number of documents in the results index. Renormalization happens for results from the last 30 days or 100 bucket spans (depending on which is the longer) by default. When you are working at scale, set `renormalization_window_days` to a lower value, so the workload is reduced. You can change this setting for existing jobs and changes will take effect after the job has been reopened.
116105

117-
118-
## 11. Optimize the model snapshot retention [model-snapshot-retention]
106+
## 11. Optimize the model snapshot retention [model-snapshot-retention]
119107

120108
Model snapshots are taken periodically, to ensure resilience in the event of a system failure and to allow you to manually revert to a specific point in time. These are stored in a compressed format in an internal index and kept according to the configured retention policy. Load is placed on the cluster when indexing a model snapshot and index size is increased as multiple snapshots are retained.
121109

@@ -125,20 +113,17 @@ Also consider how long you wish to retain snapshots using `model_snapshot_retent
125113

126114
For more information, refer to [Model snapshots](https://www.elastic.co/guide/en/machine-learning/current/ml-model-snapshots.html).
127115

128-
129-
## 12. Optimize your search queries [search-queries]
116+
## 12. Optimize your search queries [search-queries]
130117

131118
If you are operating on a big scale, make sure that your {{dfeed}} query is as efficient as possible. There are different ways to write {{es}} queries and some of them are more efficient than others. Please consult [Tune for search speed](../../../deploy-manage/production-guidance/optimize-performance/search-speed.md) to learn more about {{es}} performance tuning.
132119

133120
You need to clone or recreate an existing job if you want to optimize its search query.
134121

135-
136-
## 13. Consider using population analysis [population-analysis]
122+
## 13. Consider using population analysis [population-analysis]
137123

138124
Population analysis is more memory efficient than individual analysis of each series. It builds a profile of what a "typical" entity does over a specified time period and then identifies when one is behaving abnormally compared to the population. Use population analysis for analyzing high cardinality fields if you expect that the entities of the population generally behave in the same way.
139125

140-
141-
## 14. Reduce the cost of forecasting [forecasting]
126+
## 14. Reduce the cost of forecasting [forecasting]
142127

143128
There are two main performance factors to consider when you create a forecast: indexing load and memory usage. Check the cluster monitoring data to learn the indexing rate and the memory usage.
144129

@@ -147,4 +132,3 @@ Forecasting writes a new document to the result index for every forecasted eleme
147132
To reduce indexing load, consider a shorter forecast duration and/or try to avoid concurrent forecast requests. Further performance gains can be achieved by reviewing the job configuration; for example by using a dedicated results index, increasing the bucket span and/or by having lower cardinality partitioning fields.
148133

149134
The memory usage of a forecast is restricted to 20 MB by default. From 7.9, you can extend this limit by setting `max_model_memory` to a higher value. The maximum value is 40% of the memory limit of the {{anomaly-job}} or 500 MB. If the forecast needs more memory than the provided value, it spools to disk. Forecasts that spool to disk generally run slower. If you need to speed up forecasts, increase the available memory for the forecast. Forecasts that would take more than 500 MB to run won’t start because this is the maximum limit of disk space that a forecast is allowed to use.
150-

0 commit comments

Comments
 (0)