Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ mapped_pages:

If your data includes geographic fields, you can use {{ml-features}} to detect anomalous behavior, such as a credit card transaction that occurs in an unusual location or a web request that has an unusual source location.


## Prerequisites [geographic-anomalies-prereqs]

To run this type of {{anomaly-job}}, you must have [{{ml-features}} set up](../setting-up-machine-learning.md). You must also have time series data that contains spatial data types. In particular, you must have:
Expand All @@ -21,7 +20,6 @@ The latitude and longitude must be in the range -180 to 180 and represent a poin

This example uses the sample eCommerce orders and sample web logs data sets. For more information, see [Add the sample data](../../overview/kibana-quickstart.md#gs-get-data-into-kibana).


## Explore your geographic data [geographic-anomalies-visualize]

To get the best results from {{ml}} analytics, you must understand your data. You can use the **{{data-viz}}** in the **{{ml-app}}** app for this purpose. Search for specific fields or field types, such as geo-point fields in the sample data sets. You can see how many documents contain those fields within a specific time period and sample size. You can also see the number of distinct values, a list of example values, and preview them on a map. For example:
Expand All @@ -31,7 +29,6 @@ To get the best results from {{ml}} analytics, you must understand your data. Yo
:class: screenshot
:::


## Create an {{anomaly-job}} [geographic-anomalies-jobs]

There are a few limitations to consider before you create this type of job:
Expand All @@ -51,6 +48,7 @@ For example, create a job that analyzes the sample eCommerce orders data set to
:::

::::{dropdown} API example

```console
PUT _ml/anomaly_detectors/ecommerce-geo <1>
{
Expand Down Expand Up @@ -101,10 +99,8 @@ POST _ml/datafeeds/datafeed-ecommerce-geo/_start <4>
3. Open the job.
4. Start the {{dfeed}}. Since the sample data sets often contain timestamps that are later than the current date, it is a good idea to specify the appropriate end date for the {{dfeed}}.


::::


Alternatively, create a job that analyzes the sample web logs data set to detect events with unusual coordinates (`geo.coordinates` values) or unusually high sums of transferred data (`bytes` values):

:::{image} ../../../images/machine-learning-weblogs-advanced-wizard-geopoint.jpg
Expand All @@ -113,6 +109,7 @@ Alternatively, create a job that analyzes the sample web logs data set to detect
:::

::::{dropdown} API example

```console
PUT _ml/anomaly_detectors/weblogs-geo <1>
{
Expand Down Expand Up @@ -167,11 +164,8 @@ POST _ml/datafeeds/datafeed-weblogs-geo/_start <4>
3. Open the job.
4. Start the {{dfeed}}. Since the sample data sets often contain timestamps that are later than the current date, it is a good idea to specify the appropriate end date for the {{dfeed}}.


::::



## Analyze the results [geographic-anomalies-results]

After the {{anomaly-jobs}} have processed some data, you can view the results in {{kib}}.
Expand All @@ -180,7 +174,6 @@ After the {{anomaly-jobs}} have processed some data, you can view the results in
If you used APIs to create the jobs and {{dfeeds}}, you cannot see them in {{kib}} until you follow the prompts to synchronize the necessary saved objects.
::::


When you select a period that contains an anomaly in the **Anomaly Explorer** swim lane results, you can see a map of the typical and actual coordinates. For example, in the eCommerce sample data there is a user with anomalous shopping behavior:

:::{image} ../../../images/machine-learning-ecommerce-anomaly-explorer-geopoint.jpg
Expand Down Expand Up @@ -210,7 +203,6 @@ When you try this type of {{anomaly-job}} with your own data, it might take some

For more information about {{anomaly-detect}} concepts, see [Concepts](https://www.elastic.co/guide/en/machine-learning/current/ml-concepts.html). For the full list of functions that you can use in {{anomaly-jobs}}, see [*Function reference*](ml-functions.md). For more {{anomaly-detect}} examples, see [Examples](https://www.elastic.co/guide/en/machine-learning/current/anomaly-examples.html).


## Add anomaly layers to your maps [geographic-anomalies-map-layer]

To integrate the results from your {{anomaly-job}} in **Maps**, click **Add layer**, then select **ML Anomalies**. You must then select or create an {{anomaly-job}} that uses the `lat_long` function.
Expand All @@ -222,7 +214,6 @@ For example, you can extend the map example from [Build a map to compare metrics
:class: screenshot
:::


## What’s next [geographic-anomalies-next]

* [Learn more about **Maps**](../../visualize/maps.md)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,12 @@ mapped_pages:

If your data includes vector layers that are defined in the [{{ems}} ({{ems-init}})](../../visualize/maps/maps-connect-to-ems.md), your {{anomaly-jobs}} can generate a map of the anomalies by location.


## Prerequisites [mapping-anomalies-prereqs]

If you want to view choropleth maps in **{{data-viz}}** or {{anomaly-job}} results, you must have fields that contain valid vector layers (such as [country codes](https://maps.elastic.co/#file/world_countries) or [postal codes](https://maps.elastic.co/#file/usa_zip_codes)).

This example uses the sample web logs data set. For more information, see [Add the sample data](../../overview/kibana-quickstart.md#gs-get-data-into-kibana).


## Explore your data [visualize-vector-layers]

If you have fields that contain valid vector layers, you can use the **{{data-viz}}** in the **{{ml-app}}** app to see a choropleth map, in which each area is colored based on its document count. For example:
Expand All @@ -24,7 +22,6 @@ If you have fields that contain valid vector layers, you can use the **{{data-vi
:class: screenshot
:::


## Create an {{anomaly-job}} [mapping-anomalies-jobs]

To create an {{anomaly-job}} in {{kib}}, click **Create job** on the **{{ml-cap}} > {{anomaly-detect-cap}}** page and select an appropriate job wizard. Alternatively, use the [create {{anomaly-jobs}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-put-job.html).
Expand All @@ -37,6 +34,7 @@ For example, use the multi-metric job wizard to create a job that analyzes the s
:::

::::{dropdown} API example

```console
PUT _ml/anomaly_detectors/weblogs-vectors <1>
{
Expand Down Expand Up @@ -87,11 +85,8 @@ POST _ml/datafeeds/datafeed-weblogs-vectors/_start <4>
3. Open the job.
4. Start the {{dfeed}}. Since the sample data sets often contain timestamps that are later than the current date, it is a good idea to specify the appropriate end date for the {{dfeed}}.


::::



## Analyze the results [mapping-anomalies-results]

After the {{anomaly-jobs}} have processed some data, you can view the results in {{kib}}.
Expand All @@ -100,15 +95,13 @@ After the {{anomaly-jobs}} have processed some data, you can view the results in
If you used APIs to create the jobs and {{dfeeds}}, you cannot see them in {{kib}} until you follow the prompts to synchronize the necessary saved objects.
::::


:::{image} ../../../images/machine-learning-weblogs-anomaly-explorer-vectors.png
:alt: A screenshot of the anomaly count by location in Anomaly Explorer
:class: screenshot
:::

The **Anomaly Explorer** contains a map, which is affected by your swim lane selections. It colors each location to reflect the number of anomalies in that selected time period. Locations that have few anomalies are indicated in blue; locations with many anomalies are red. Thus you can quickly see the locations that are generating the most anomalies. If your vector layers define regions, counties, or postal codes, you can zoom in for fine details.


## What’s next [mapping-anomalies-next]

* [Learn more about **Maps**](../../visualize/maps.md)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,3 @@ This section contains further resources for using {{anomaly-detect}}.
* [*Function reference*](ml-functions.md)
* [Supplied configurations](ootb-ml-jobs.md)
* [Troubleshooting and FAQ](ml-ad-troubleshooting.md)



Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,10 @@ mapped_pages:
- https://www.elastic.co/guide/en/machine-learning/current/ml-ad-troubleshooting.html
---



# Troubleshooting and FAQ [ml-ad-troubleshooting]


Use the information in this section to troubleshoot common problems and find answers for frequently asked questions.


## How to restart failed {{anomaly-jobs}} [ml-ad-restart-failed-jobs]

If an {{anomaly-job}} fails, try to restart the job by following the procedure described below. If the restarted job runs as expected, then the problem that caused the job to fail was transient and no further investigation is needed. If the job quickly fails after the restart, then the problem is persistent and needs further investigation. In this case, find out which node the failed job was running on by checking the job stats on the **Job management** pane in {{kib}}. Then get the logs for that node and look for exceptions and errors where the ID of the {{anomaly-job}} is in the message to have a better understanding of the issue.
Expand All @@ -35,7 +31,6 @@ If an {{anomaly-job}} has failed, do the following to recover from `failed` stat

3. Restart the {{anomaly-job}} on the **Job management** pane in {{kib}}.


## What {{ml}} methods are used for {{anomaly-detect}}? [faq-methods]

For detailed information, refer to the paper [Anomaly Detection in Application Performance Monitoring Data](https://www.ijmlc.org/papers/398-LC018.pdf) by Thomas Veasey and Stephen Dodson, as well as our webinars on [The Math behind Elastic Machine Learning](https://www.elastic.co/elasticon/conf/2018/sf/the-math-behind-elastic-machine-learning) and [Machine Learning and Statistical Methods for Time Series Analysis](https://www.elastic.co/elasticon/conf/2017/sf/machine-learning-and-statistical-methods-for-time-series-analysis).
Expand All @@ -47,30 +42,25 @@ Further papers cited in the C++ code:
* [Large-Scale Bayesian Logistic Regression for Text Categorization](http://www.stat.columbia.edu/~madigan/PAPERS/techno.pdf)
* [X-means: Extending K-means with Efficient Estimation of the Number of Clusters](https://www.cs.cmu.edu/~dpelleg/download/xmeans.pdf)


## What are the input features used by the model? [faq-features]

All input features are specified by the user, for example, using [diverse statistical functions](https://www.elastic.co/guide/en/machine-learning/current/ml-functions.html) like count or mean over the data of interest.


## Does the data used by the model only include customers' data? [faq-data]

Yes. Only the data specified in the {{anomaly-job}} configuration are used for detection.


## What does the model output score represent? How is it generated and calibrated? [faq-output-score]

The ensemble model generates a probability value, which is then mapped to an anomaly severity score between 0 and 100. The lower the probability of observed data, the higher the severity score. Refer to this [advanced concept doc](ml-ad-explain.md) for details. Calibration (also called as normalization) happens on two levels:

1. Within the same metric/partition, the scores are re-normalized “back in time” within the window specified by the `renormalization_window_days` parameter. This is the reason, for example, that both `record_score` and `initial_record_score` exist.
2. Over multiple partitions, scores are renormalized as described in [this blog post](https://www.elastic.co/blog/changes-to-elastic-machine-learning-anomaly-scoring-in-6-5).


## Is the model static or updated periodically? [faq-model-update]

It’s an online model and updated continuously. Old parts of the model are pruned out based on the parameter `model_prune_window` (usually 30 days).


## Is the performance of the model monitored? [faq-model-performance]

There is a set of benchmarks to monitor the performance of the {{anomaly-detect}} algorithms and to ensure no regression occurs as the methods are continuously developed and refined. They are called "data scenarios" and consist of 3 things:
Expand All @@ -87,14 +77,12 @@ On the customer side, the situation is different. There is no conventional way t
* Use the forecasting feature to predict the development of the metric of interest in the future.
* Use one or a combination of multiple {{anomaly-jobs}} to identify the significant anomaly influencers.


## How to measure the accuracy of the unsupervised {{ml}} model? [faq-model-accuracy]

For each record in a given time series, anomaly detection models provide an anomaly severity score, 95% confidence intervals, and an actual value. This data is stored in an index and can be retrieved using the Get Records API. With this information, you can use standard measures to assess prediction accuracy, interval calibration, and so on. Elasticsearch aggregations can be used to compute these statistics.

The purpose of {{anomaly-detect}} is to achieve the best ranking of periods where an anomaly happened. A practical way to evaluate this is to keep track of real incidents and see how well they correlate with the predictions of {{anomaly-detect}}.


## Can the {{anomaly-detect}} model experience model drift? [faq-model-drift]

Elasticsearch’s {{anomaly-detect}} model continuously learns and adapts to changes in the time series. These changes can take the form of slow drifts as well as sudden jumps. Therefore, we take great care to manage the adaptation to changing data characteristics. There is always a fine trade-off between fitting anomalous periods (over-fitting) and not learning new normal behavior. The following are the main approaches Elastic uses to manage this trade-off:
Expand All @@ -105,7 +93,6 @@ Elasticsearch’s {{anomaly-detect}} model continuously learns and adapts to cha
* Running continuous hypothesis tests on time windows of various lengths to test for significant evidence of new or changed periodic patterns, and update the model if the null hypothesis of unchanged features is rejected.
* Accumulating error statistics on calendar days and continuously test whether predictive calendar features need to be added or removed from the model.


## What is the minimum amount of data for an {{anomaly-job}}? [faq-minimum-data]

Elastic {{ml}} needs a minimum amount of data to be able to build an effective model for {{anomaly-detect}}.
Expand All @@ -120,7 +107,6 @@ Rules of thumb:
* more than three weeks for periodic data or a few hundred buckets for non-periodic data
* at least as much data as you want to forecast


## Are there any checks or processes to ensure data integrity? [faq-data-integrity]

The Elastic {{ml}} algorithms are programmed to work with missing and noisy data and use denoising and data reputation techniques based on the learned statistical properties.
Loading
Loading