elastic · lcawl · Jan 28, 2025 · Jan 28, 2025
@@ -0,0 +1,285 @@
+[[ml-count-functions]]
+= Count functions
+
+Count functions detect anomalies when the number of events in a bucket is
+anomalous.
+
+Use `non_zero_count` functions if your data is sparse and you want to ignore
+cases where the bucket count is zero.
+
+Use `distinct_count` functions to determine when the number of distinct values
+in one field is unusual, as opposed to the total count.
+
+Use high-sided functions if you want to monitor unusually high event rates.
+Use low-sided functions if you want to look at drops in event rate.
+
+The {ml-features} include the following count functions:
+
+* xref:ml-count[`count`, `high_count`, `low_count`]
+* xref:ml-nonzero-count[`non_zero_count`, `high_non_zero_count`, `low_non_zero_count`]
+* xref:ml-distinct-count[`distinct_count`, `high_distinct_count`, `low_distinct_count`]
+
+[discrete]
+[[ml-count]]
+== Count, high_count, low_count
+
+The `count` function detects anomalies when the number of events in a bucket is
+anomalous.
+
+The `high_count` function detects anomalies when the count of events in a bucket 
+are unusually high.
+
+The `low_count` function detects anomalies when the count of events in a bucket 
+are unusually low.
+
+These functions support the following properties:
+
+* `by_field_name` (optional)
+* `over_field_name` (optional)
+* `partition_field_name` (optional)
+
+For more information about those properties, see the
+{ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
+
+.Example 1: Analyzing events with the count function
+[source,console]
+--------------------------------------------------
+PUT _ml/anomaly_detectors/example1
+{
+  "analysis_config": {
+    "detectors": [{
+      "function" : "count"
+    }]
+  },
+  "data_description": {
+    "time_field":"timestamp",
+    "time_format": "epoch_ms"
+  }
+}
+--------------------------------------------------
+// TEST[skip:needs-licence]
+
+This example is probably the simplest possible analysis. It identifies
+time buckets during which the overall count of events is higher or lower than
+usual.
+
+When you use this function in a detector in your {anomaly-job}, it models the
+event rate and detects when the event rate is unusual compared to its past
+behavior.
+
+.Example 2: Analyzing errors with the high_count function
+[source,console]
+--------------------------------------------------
+PUT _ml/anomaly_detectors/example2
+{
+  "analysis_config": {
+    "detectors": [{
+      "function" : "high_count",
+      "by_field_name" : "error_code",
+      "over_field_name": "user"
+    }]
+  },
+  "data_description": {
+    "time_field":"timestamp",
+    "time_format": "epoch_ms"
+  }
+}
+--------------------------------------------------
+// TEST[skip:needs-licence]
+
+If you use this `high_count` function in a detector in your {anomaly-job}, it
+models the event rate for each error code. It detects users that generate an
+unusually high count of error codes compared to other users.
+
+
+.Example 3: Analyzing status codes with the low_count function
+[source,console]
+--------------------------------------------------
+PUT _ml/anomaly_detectors/example3
+{
+  "analysis_config": {
+    "detectors": [{
+      "function" : "low_count",
+      "by_field_name" : "status_code"
+    }]
+  },
+  "data_description": {
+    "time_field":"timestamp",
+    "time_format": "epoch_ms"
+  }
+}
+--------------------------------------------------
+// TEST[skip:needs-licence]
+
+In this example, the function detects when the count of events for a status code 
+is lower than usual.
+
+When you use this function in a detector in your {anomaly-job}, it models the
+event rate for each status code and detects when a status code has an unusually
+low count compared to its past behavior.
+
+.Example 4: Analyzing aggregated data with the count function
+[source,console]
+--------------------------------------------------
+PUT _ml/anomaly_detectors/example4
+{
+  "analysis_config": {
+    "summary_count_field_name" : "events_per_min",
+    "detectors": [{
+      "function" : "count"
+    }]
+  },
+  "data_description": {
+    "time_field":"timestamp",
+    "time_format": "epoch_ms"
+  }
+}  
+--------------------------------------------------
+// TEST[skip:needs-licence]
+
+If you are analyzing an aggregated `events_per_min` field, do not use a sum
+function (for example, `sum(events_per_min)`). Instead, use the count function
+and the `summary_count_field_name` property. For more information, see 
+<<ml-configuring-aggregation>>.
+
+[discrete]
+[[ml-nonzero-count]]
+== Non_zero_count, high_non_zero_count, low_non_zero_count
+
+The `non_zero_count` function detects anomalies when the number of events in a
+bucket is anomalous, but it ignores cases where the bucket count is zero. Use
+this function if you know your data is sparse or has gaps and the gaps are not
+important.
+
+The `high_non_zero_count` function detects anomalies when the number of events
+in a bucket is unusually high and it ignores cases where the bucket count is
+zero.
+
+The `low_non_zero_count` function detects anomalies when the number of events in
+a bucket is unusually low and it ignores cases where the bucket count is zero.
+
+These functions support the following properties:
+
+* `by_field_name` (optional)
+* `partition_field_name` (optional)
+
+For more information about those properties, see the
+{ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
+
+For example, if you have the following number of events per bucket:
+
+====
+
+1,22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,43,31,0,0,0,0,0,0,0,0,0,0,0,0,2,1
+
+====
+
+The `non_zero_count` function models only the following data:
+
+====
+
+1,22,2,43,31,2,1
+
+====
+
+.Example 5: Analyzing signatures with the high_non_zero_count function
+[source,console]
+--------------------------------------------------
+PUT _ml/anomaly_detectors/example5
+{
+  "analysis_config": {
+    "detectors": [{
+      "function" : "high_non_zero_count",
+      "by_field_name" : "signaturename"
+    }]
+  },
+  "data_description": {
+    "time_field":"timestamp",
+    "time_format": "epoch_ms"
+  }
+}
+--------------------------------------------------
+// TEST[skip:needs-licence]
+
+If you use this `high_non_zero_count` function in a detector in your
+{anomaly-job}, it models the count of events for the `signaturename` field. It
+ignores any buckets where the count is zero and detects when a `signaturename`
+value has an unusually high count of events compared to its past behavior.
+
+NOTE: Population analysis (using an `over_field_name` property value) is not
+supported for the `non_zero_count`, `high_non_zero_count`, and
+`low_non_zero_count` functions. If you want to do population analysis and your
+data is sparse, use the `count` functions, which are optimized for that scenario.
+
+
+[discrete]
+[[ml-distinct-count]]
+== Distinct_count, high_distinct_count, low_distinct_count
+
+The `distinct_count` function detects anomalies where the number of distinct
+values in one field is unusual.
+
+The `high_distinct_count` function detects unusually high numbers of distinct
+values in one field.
+
+The `low_distinct_count` function detects unusually low numbers of distinct
+values in one field.
+
+These functions support the following properties:
+
+* `field_name` (required)
+* `by_field_name` (optional)
+* `over_field_name` (optional)
+* `partition_field_name` (optional)
+
+For more information about those properties, see the
+{ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
+
+.Example 6: Analyzing users with the distinct_count function
+[source,console]
+--------------------------------------------------
+PUT _ml/anomaly_detectors/example6
+{
+  "analysis_config": {
+    "detectors": [{
+      "function" : "distinct_count",
+      "field_name" : "user"
+    }]
+  },
+  "data_description": {
+    "time_field":"timestamp",
+    "time_format": "epoch_ms"
+  }
+}
+--------------------------------------------------
+// TEST[skip:needs-licence]
+
+This `distinct_count` function detects when a system has an unusual number
+of logged in users. When you use this function in a detector in your
+{anomaly-job}, it models the distinct count of users. It also detects when the
+distinct number of users is unusual compared to the past.
+
+.Example 7: Analyzing ports with the high_distinct_count function
+[source,console]
+--------------------------------------------------
+PUT _ml/anomaly_detectors/example7
+{
+  "analysis_config": {
+    "detectors": [{
+      "function" : "high_distinct_count",
+      "field_name" : "dst_port",
+      "over_field_name": "src_ip"
+    }]
+  },
+  "data_description": {
+    "time_field":"timestamp",
+    "time_format": "epoch_ms"
+  }
+}
+--------------------------------------------------
+// TEST[skip:needs-licence]
+
+This example detects instances of port scanning. When you use this function in a
+detector in your {anomaly-job}, it models the distinct count of ports. It also
+detects the `src_ip` values that connect to an unusually high number of 
+different `dst_ports` values compared to other `src_ip` values.
@@ -0,0 +1,43 @@
+[[ml-functions]]
+= Function reference
+
+The {ml-features} include analysis functions that provide a wide variety of
+flexible ways to analyze data for anomalies.
+
+When you create {anomaly-jobs}, you specify one or more detectors, which define
+the type of analysis that needs to be done. If you are creating your job by
+using {ml} APIs, you specify the functions in detector configuration objects.
+If you are creating your job in {kib}, you specify the functions differently
+depending on whether you are creating single metric, multi-metric, or advanced
+jobs.
+//For a demonstration of creating jobs in {kib}, see <<ml-getting-started>>.
+
+Most functions detect anomalies in both low and high values. In statistical
+terminology, they apply a two-sided test. Some functions offer low and high
+variations (for example, `count`, `low_count`, and `high_count`). These variations
+apply one-sided tests, detecting anomalies only when the values are low or
+high, depending one which alternative is used.
+
+You can specify a `summary_count_field_name` with any function except `metric`.
+When you use `summary_count_field_name`, the {ml} features expect the input
+data to be pre-aggregated. The value of the `summary_count_field_name` field
+must contain the count of raw events that were summarized. In {kib}, use the
+**summary_count_field_name** in advanced {anomaly-jobs}. Analyzing aggregated
+input data provides a significant boost in performance. For more information, see
+<<ml-configuring-aggregation>>.
+
+If your data is sparse, there may be gaps in the data which means you might have
+empty buckets. You might want to treat these as anomalies or you might want these
+gaps to be ignored. Your decision depends on your use case and what is important
+to you. It also depends on which functions you use. The `sum` and `count`
+functions are strongly affected by empty buckets. For this reason, there are
+`non_null_sum` and `non_zero_count` functions, which are tolerant to sparse data.
+These functions effectively ignore empty buckets.
+
+* <<ml-count-functions,Count functions>>
+* <<ml-geo-functions,Geographic functions>>
+* <<ml-info-functions,Information content functions>>
+* <<ml-metric-functions,Metric functions>>
+* <<ml-rare-functions,Rare functions>>
+* <<ml-sum-functions,Sum functions>>
+* <<ml-time-functions,Time functions>>