Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ There are a few limitations to consider before you create this type of job:
1. You cannot create forecasts for {{anomaly-jobs}} that contain geographic functions.
2. You cannot add [custom rules with conditions](/explore-analyze/machine-learning/anomaly-detection/ml-ad-run-jobs.md#ml-ad-rules) to detectors that use geographic functions.

If those limitations are acceptable, try creating an {{anomaly-job}} that uses the [`lat_long` function](/reference/data-analysis/machine-learning/ml-geo-functions.md#ml-lat-long) to analyze your own data or the sample data sets.
If those limitations are acceptable, try creating an {{anomaly-job}} that uses the [`lat_long` function](/reference/machine-learning/ml-geo-functions.md#ml-lat-long) to analyze your own data or the sample data sets.

To create an {{anomaly-job}} that uses the `lat_long` function, navigate to the **Anomaly Detection Jobs** page in the main menu, or use the [global search field](../../find-and-organize/find-apps-and-objects.md). Then click **Create job** and select the appropriate job wizard. Alternatively, use the [create {{anomaly-jobs}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-put-job).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@ There are a number of requirements for using aggregations in {{dfeeds}}.

## Recommendations [aggs-recommendations-dfeeds]

* When your detectors use [metric](/reference/data-analysis/machine-learning/ml-metric-functions.md) or [sum](/reference/data-analysis/machine-learning/ml-sum-functions.md) analytical functions, it’s recommended to set the `date_histogram` or `composite` aggregation interval to a tenth of the bucket span. This creates finer, more granular time buckets, which are ideal for this type of analysis.
* When your detectors use [count](/reference/data-analysis/machine-learning/ml-count-functions.md) or [rare](/reference/data-analysis/machine-learning/ml-rare-functions.md) functions, set the interval to the same value as the bucket span.
* When your detectors use [metric](/reference/machine-learning/ml-metric-functions.md) or [sum](/reference/machine-learning/ml-sum-functions.md) analytical functions, it’s recommended to set the `date_histogram` or `composite` aggregation interval to a tenth of the bucket span. This creates finer, more granular time buckets, which are ideal for this type of analysis.
* When your detectors use [count](/reference/machine-learning/ml-count-functions.md) or [rare](/reference/machine-learning/ml-rare-functions.md) functions, set the interval to the same value as the bucket span.
* If you have multiple influencers or partition fields or if your field cardinality is more than 1000, use [composite aggregations](elasticsearch://reference/aggregations/search-aggregations-bucket-composite-aggregation.md).

To determine the cardinality of your data, you can run searches such as:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ products:

# Detecting anomalous categories of data [ml-configuring-categories]

Categorization is a {{ml}} process that tokenizes a text field, clusters similar data together, and classifies it into categories. It works best on machine-written messages and application output that typically consist of repeated elements. [Categorization jobs](ml-anomaly-detection-job-types.md#categorization-jobs) enable you to find anomalous behavior in your categorized data. Categorization is not natural language processing (NLP). When you create a categorization {{anomaly-job}}, the {{ml}} model learns what volume and pattern is normal for each category over time. You can then detect anomalies and surface rare events or unusual types of messages by using [count](/reference/data-analysis/machine-learning/ml-count-functions.md) or [rare](/reference/data-analysis/machine-learning/ml-rare-functions.md) functions. Categorization works well on finite set of possible messages, for example:
Categorization is a {{ml}} process that tokenizes a text field, clusters similar data together, and classifies it into categories. It works best on machine-written messages and application output that typically consist of repeated elements. [Categorization jobs](ml-anomaly-detection-job-types.md#categorization-jobs) enable you to find anomalous behavior in your categorized data. Categorization is not natural language processing (NLP). When you create a categorization {{anomaly-job}}, the {{ml}} model learns what volume and pattern is normal for each category over time. You can then detect anomalies and surface rare events or unusual types of messages by using [count](/reference/machine-learning/ml-count-functions.md) or [rare](/reference/machine-learning/ml-rare-functions.md) functions. Categorization works well on finite set of possible messages, for example:

```js
{"@timestamp":1549596476000,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -382,7 +382,7 @@ PUT _ml/anomaly_detectors/test3
GET _ml/datafeeds/datafeed-test3/_preview
```

In {{es}}, location data can be stored in `geo_point` fields but this data type is not supported natively in {{ml}} analytics. This example of a runtime field transforms the data into an appropriate format. For more information, see [Geographic functions](/reference/data-analysis/machine-learning/ml-geo-functions.md).
In {{es}}, location data can be stored in `geo_point` fields but this data type is not supported natively in {{ml}} analytics. This example of a runtime field transforms the data into an appropriate format. For more information, see [Geographic functions](/reference/machine-learning/ml-geo-functions.md).

The preview {{dfeed}} API returns the following results, which show that `41.44` and `90.5` have been combined into "41.44,90.5":

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ You can specify a `summary_count_field_name` with any function except `metric`.

If your data is sparse, there may be gaps in the data which means you might have empty buckets. You might want to treat these as anomalies or you might want these gaps to be ignored. Your decision depends on your use case and what is important to you. It also depends on which functions you use. The `sum` and `count` functions are strongly affected by empty buckets. For this reason, there are `non_null_sum` and `non_zero_count` functions, which are tolerant to sparse data. These functions effectively ignore empty buckets.

* [Count functions](/reference/data-analysis/machine-learning/ml-count-functions.md)
* [Geographic functions](/reference/data-analysis/machine-learning/ml-geo-functions.md)
* [Information content functions](/reference/data-analysis/machine-learning/ml-info-functions.md)
* [Metric functions](/reference/data-analysis/machine-learning/ml-metric-functions.md)
* [Rare functions](/reference/data-analysis/machine-learning/ml-rare-functions.md)
* [Sum functions](/reference/data-analysis/machine-learning/ml-sum-functions.md)
* [Time functions](/reference/data-analysis/machine-learning/ml-time-functions.md)
* [Count functions](/reference/machine-learning/ml-count-functions.md)
* [Geographic functions](/reference/machine-learning/ml-geo-functions.md)
* [Information content functions](/reference/machine-learning/ml-info-functions.md)
* [Metric functions](/reference/machine-learning/ml-metric-functions.md)
* [Rare functions](/reference/machine-learning/ml-rare-functions.md)
* [Sum functions](/reference/machine-learning/ml-sum-functions.md)
* [Time functions](/reference/machine-learning/ml-time-functions.md)
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,6 @@ If you’re now thinking about where {{anomaly-detect}} can be most impactful fo

In general, it is a good idea to start with single metric {{anomaly-jobs}} for your key performance indicators. After you examine these simple analysis results, you will have a better idea of what the influencers might be. You can create multi-metric jobs and split the data or create more complex analysis functions as necessary. For examples of more complicated configuration options, see [Examples](/explore-analyze/machine-learning/anomaly-detection/anomaly-how-tos.md).

If you want to find more sample jobs, see [Supplied configurations](ootb-ml-jobs.md). In particular, there are sample jobs for [Apache](/reference/data-analysis/machine-learning/ootb-ml-jobs-apache.md) and [Nginx](/reference/data-analysis/machine-learning/ootb-ml-jobs-nginx.md) that are quite similar to the examples in this tutorial.
If you want to find more sample jobs, see [Supplied configurations](ootb-ml-jobs.md). In particular, there are sample jobs for [Apache](/reference/machine-learning/ootb-ml-jobs-apache.md) and [Nginx](/reference/machine-learning/ootb-ml-jobs-nginx.md) that are quite similar to the examples in this tutorial.

If you encounter problems, we’re here to help. If you are an existing Elastic customer with a support contract, create a ticket in the [Elastic Support portal](http://support.elastic.co). Or post in the [Elastic forum](https://discuss.elastic.co/).
Original file line number Diff line number Diff line change
Expand Up @@ -140,22 +140,22 @@ The charts can also look odd in circumstances where there is very little data to

| Detector functions | Function description | Supported |
| --- | --- | --- |
| count, high_count, low_count, non_zero_count, low_non_zero_count | [Count functions](/reference/data-analysis/machine-learning/ml-count-functions.md) | yes |
| count, high_count, low_count, non_zero_count, low_non_zero_count with summary_count_field_name that is not doc_count (model plot not enabled) | [Count functions](/reference/data-analysis/machine-learning/ml-count-functions.md) | yes |
| non_zero_count with summary_count_field that is not doc_count using cardinality aggregation in datafeed config (model plot not enabled) | [Count functions](/reference/data-analysis/machine-learning/ml-count-functions.md) | yes |
| distinct_count, high_distinct_count, low_distinct_count | [Count functions](/reference/data-analysis/machine-learning/ml-count-functions.md) | yes |
| mean, high_mean, low_mean | [Mean, high_mean, low_mean](/reference/data-analysis/machine-learning/ml-metric-functions.md#ml-metric-mean) | yes |
| min | [Min](/reference/data-analysis/machine-learning/ml-metric-functions.md#ml-metric-min) | yes |
| max | [Max](/reference/data-analysis/machine-learning/ml-metric-functions.md#ml-metric-max) | yes |
| metric | [Metric](/reference/data-analysis/machine-learning/ml-metric-functions.md#ml-metric-metric) | yes |
| median, high_median, low_median | [Median, high_median, low_median](/reference/data-analysis/machine-learning/ml-metric-functions.md#ml-metric-median) | yes |
| sum, high_sum ,low_sum, non_null_sum, high_non_null_sum, low_non_null_sum | [Sum functions](/reference/data-analysis/machine-learning/ml-sum-functions.md) | yes |
| varp, high_varp, low_varp | [Varp, high_varp, low_varp](/reference/data-analysis/machine-learning/ml-metric-functions.md#ml-metric-varp) | yes (only if model plot is enabled) |
| lat_long | [Lat_long](/reference/data-analysis/machine-learning/ml-geo-functions.md#ml-lat-long) | no (but map is displayed in the Anomaly Explorer) |
| info_content, high_info_content, low_info_content | [Info_content, High_info_content, Low_info_content](/reference/data-analysis/machine-learning/ml-info-functions.md#ml-info-content) | yes (only if model plot is enabled) |
| rare | [Rare](/reference/data-analysis/machine-learning/ml-rare-functions.md#ml-rare) | yes |
| freq_rare | [Freq_rare](/reference/data-analysis/machine-learning/ml-rare-functions.md#ml-freq-rare) | no |
| time_of_day, time_of_week | [Time functions](/reference/data-analysis/machine-learning/ml-time-functions.md) | no |
| count, high_count, low_count, non_zero_count, low_non_zero_count | [Count functions](/reference/machine-learning/ml-count-functions.md) | yes |
| count, high_count, low_count, non_zero_count, low_non_zero_count with summary_count_field_name that is not doc_count (model plot not enabled) | [Count functions](/reference/machine-learning/ml-count-functions.md) | yes |
| non_zero_count with summary_count_field that is not doc_count using cardinality aggregation in datafeed config (model plot not enabled) | [Count functions](/reference/machine-learning/ml-count-functions.md) | yes |
| distinct_count, high_distinct_count, low_distinct_count | [Count functions](/reference/machine-learning/ml-count-functions.md) | yes |
| mean, high_mean, low_mean | [Mean, high_mean, low_mean](/reference/machine-learning/ml-metric-functions.md#ml-metric-mean) | yes |
| min | [Min](/reference/machine-learning/ml-metric-functions.md#ml-metric-min) | yes |
| max | [Max](/reference/machine-learning/ml-metric-functions.md#ml-metric-max) | yes |
| metric | [Metric](/reference/machine-learning/ml-metric-functions.md#ml-metric-metric) | yes |
| median, high_median, low_median | [Median, high_median, low_median](/reference/machine-learning/ml-metric-functions.md#ml-metric-median) | yes |
| sum, high_sum ,low_sum, non_null_sum, high_non_null_sum, low_non_null_sum | [Sum functions](/reference/machine-learning/ml-sum-functions.md) | yes |
| varp, high_varp, low_varp | [Varp, high_varp, low_varp](/reference/machine-learning/ml-metric-functions.md#ml-metric-varp) | yes (only if model plot is enabled) |
| lat_long | [Lat_long](/reference/machine-learning/ml-geo-functions.md#ml-lat-long) | no (but map is displayed in the Anomaly Explorer) |
| info_content, high_info_content, low_info_content | [Info_content, High_info_content, Low_info_content](/reference/machine-learning/ml-info-functions.md#ml-info-content) | yes (only if model plot is enabled) |
| rare | [Rare](/reference/machine-learning/ml-rare-functions.md#ml-rare) | yes |
| freq_rare | [Freq_rare](/reference/machine-learning/ml-rare-functions.md#ml-freq-rare) | no |
| time_of_day, time_of_week | [Time functions](/reference/machine-learning/ml-time-functions.md) | no |

### Jobs created in {{kib}} must use {{dfeeds}} [_jobs_created_in_kib_must_use_dfeeds]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@ products:

{{anomaly-jobs-cap}} contain the configuration information and metadata necessary to perform an analytics task. {{kib}} can recognize certain types of data and provide specialized wizards for that context. This page lists the categories of the {{anomaly-jobs}} that are ready to use via {{kib}} in **Machine learning**. Refer to [Create {{anomaly-jobs}}](/explore-analyze/machine-learning/anomaly-detection/ml-ad-run-jobs.md#ml-ad-create-job) to learn more about creating a job by using supplied configurations. Logs and Metrics supplied configurations are available and can be created via the related solution UI in {{kib}}.

* [Apache](/reference/data-analysis/machine-learning/ootb-ml-jobs-apache.md)
* [APM](/reference/data-analysis/machine-learning/ootb-ml-jobs-apm.md)
* [{{auditbeat}}](/reference/data-analysis/machine-learning/ootb-ml-jobs-auditbeat.md)
* [Logs](/reference/data-analysis/machine-learning/ootb-ml-jobs-logs-ui.md)
* [{{metricbeat}}](/reference/data-analysis/machine-learning/ootb-ml-jobs-metricbeat.md)
* [Metrics](/reference/data-analysis/machine-learning/ootb-ml-jobs-metrics-ui.md)
* [Nginx](/reference/data-analysis/machine-learning/ootb-ml-jobs-nginx.md)
* [Security](/reference/data-analysis/machine-learning/ootb-ml-jobs-siem.md)
* [Uptime](/reference/data-analysis/machine-learning/ootb-ml-jobs-uptime.md)
* [Apache](/reference/machine-learning/ootb-ml-jobs-apache.md)
* [APM](/reference/machine-learning/ootb-ml-jobs-apm.md)
* [{{auditbeat}}](/reference/machine-learning/ootb-ml-jobs-auditbeat.md)
* [Logs](/reference/machine-learning/ootb-ml-jobs-logs-ui.md)
* [{{metricbeat}}](/reference/machine-learning/ootb-ml-jobs-metricbeat.md)
* [Metrics](/reference/machine-learning/ootb-ml-jobs-metrics-ui.md)
* [Nginx](/reference/machine-learning/ootb-ml-jobs-nginx.md)
* [Security](/reference/machine-learning/ootb-ml-jobs-siem.md)
* [Uptime](/reference/machine-learning/ootb-ml-jobs-uptime.md)

::::{note}
The configurations are only available if data exists that matches the queries specified in the manifest files. These recognizer queries are linked in the descriptions of the individual configurations.
Expand Down
29 changes: 28 additions & 1 deletion redirects.yml
Original file line number Diff line number Diff line change
Expand Up @@ -551,9 +551,36 @@ redirects:
'reference/data-analysis/kibana/canvas-functions.md': 'explore-analyze/visualize/canvas/canvas-function-reference.md'
'reference/data-analysis/kibana/tinymath-functions.md': 'explore-analyze/visualize/canvas/canvas-tinymath-functions.md'

# Related to data-analysis restructure - moved observability metrics to reference/observability
'reference/data-analysis/observability/index.md': 'reference/observability/metrics-reference.md'
'reference/data-analysis/observability/observability-host-metrics.md': 'reference/observability/observability-host-metrics.md'
'reference/data-analysis/observability/observability-container-metrics.md': 'reference/observability/observability-container-metrics.md'
'reference/data-analysis/observability/observability-kubernetes-pod-metrics.md': 'reference/observability/observability-kubernetes-pod-metrics.md'
'reference/data-analysis/observability/observability-aws-metrics.md': 'reference/observability/observability-aws-metrics.md'

# Renamed data-analysis to machine-learning
'reference/data-analysis/index.md': 'reference/machine-learning/index.md'
'reference/data-analysis/machine-learning/supplied-anomaly-detection-configurations.md': 'reference/machine-learning/supplied-anomaly-detection-configurations.md'
'reference/data-analysis/machine-learning/machine-learning-functions.md': 'reference/machine-learning/machine-learning-functions.md'
'reference/data-analysis/machine-learning/ml-count-functions.md': 'reference/machine-learning/ml-count-functions.md'
'reference/data-analysis/machine-learning/ml-geo-functions.md': 'reference/machine-learning/ml-geo-functions.md'
'reference/data-analysis/machine-learning/ml-info-functions.md': 'reference/machine-learning/ml-info-functions.md'
'reference/data-analysis/machine-learning/ml-metric-functions.md': 'reference/machine-learning/ml-metric-functions.md'
'reference/data-analysis/machine-learning/ml-rare-functions.md': 'reference/machine-learning/ml-rare-functions.md'
'reference/data-analysis/machine-learning/ml-sum-functions.md': 'reference/machine-learning/ml-sum-functions.md'
'reference/data-analysis/machine-learning/ml-time-functions.md': 'reference/machine-learning/ml-time-functions.md'
'reference/data-analysis/machine-learning/ootb-ml-jobs-apache.md': 'reference/machine-learning/ootb-ml-jobs-apache.md'
'reference/data-analysis/machine-learning/ootb-ml-jobs-apm.md': 'reference/machine-learning/ootb-ml-jobs-apm.md'
'reference/data-analysis/machine-learning/ootb-ml-jobs-auditbeat.md': 'reference/machine-learning/ootb-ml-jobs-auditbeat.md'
'reference/data-analysis/machine-learning/ootb-ml-jobs-logs-ui.md': 'reference/machine-learning/ootb-ml-jobs-logs-ui.md'
'reference/data-analysis/machine-learning/ootb-ml-jobs-metricbeat.md': 'reference/machine-learning/ootb-ml-jobs-metricbeat.md'
'reference/data-analysis/machine-learning/ootb-ml-jobs-metrics-ui.md': 'reference/machine-learning/ootb-ml-jobs-metrics-ui.md'
'reference/data-analysis/machine-learning/ootb-ml-jobs-nginx.md': 'reference/machine-learning/ootb-ml-jobs-nginx.md'
'reference/data-analysis/machine-learning/ootb-ml-jobs-siem.md': 'reference/machine-learning/ootb-ml-jobs-siem.md'
'reference/data-analysis/machine-learning/ootb-ml-jobs-uptime.md': 'reference/machine-learning/ootb-ml-jobs-uptime.md'

# Remote cluster settings moved to reference: https://github.com/elastic/docs-content/issues/579
'deploy-manage/remote-clusters/remote-clusters-settings.md': 'elasticsearch://reference/elasticsearch/configuration-reference/remote-clusters.md'




Loading
Loading