From b458a8683ef99ce4b5e01b62971439c7a47ad7e2 Mon Sep 17 00:00:00 2001 From: Mike Birnstiehl Date: Mon, 25 Aug 2025 17:53:26 -0500 Subject: [PATCH 1/9] Add schema collector docs --- .../create-an-inventory-rule.md | 4 ++-- .../infra-and-hosts/analyze-compare-hosts.md | 16 +++++++++++++--- .../infra-and-hosts/detect-metric-anomalies.md | 13 ++++++------- 3 files changed, 21 insertions(+), 12 deletions(-) diff --git a/solutions/observability/incident-management/create-an-inventory-rule.md b/solutions/observability/incident-management/create-an-inventory-rule.md index 05ba6767c3..4a08604935 100644 --- a/solutions/observability/incident-management/create-an-inventory-rule.md +++ b/solutions/observability/incident-management/create-an-inventory-rule.md @@ -31,12 +31,12 @@ When you select **Create inventory alert**, the parameters you configured on the :::: - - ## Inventory conditions [inventory-conditions] Conditions for each rule can be applied to specific metrics relating to the inventory type you select. You can choose the aggregation type, the metric, and by including a warning threshold value, you can be alerted on multiple threshold values based on severity scores. When creating the rule, you can still get notified if no data is returned for the specific metric or if the rule fails to query {{es}}. +When creating a rule for `Hosts`, you also need to select a data collection schema in the **Schema** field. Select **Elastic System Integration** for host data collected using the Elastic System Integration or **OpenTelemetry** for host data collected using OpenTelemetry. + In this example, Kubernetes Pods is the selected inventory type. The conditions state that you will receive a critical alert for any pods within the `ingress-nginx` namespace with a memory usage of 95% or above and a warning alert if memory usage is 90% or above. The chart shows the results of applying the rule to the last 20 minutes of data. Note that the chart time range is 20 times the value of the look-back window specified in the `FOR THE LAST` field. :::{image} /solutions/images/serverless-inventory-alert.png diff --git a/solutions/observability/infra-and-hosts/analyze-compare-hosts.md b/solutions/observability/infra-and-hosts/analyze-compare-hosts.md index 83010ef1ae..3f6069c0f3 100644 --- a/solutions/observability/infra-and-hosts/analyze-compare-hosts.md +++ b/solutions/observability/infra-and-hosts/analyze-compare-hosts.md @@ -36,9 +36,12 @@ To learn more about the metrics shown on this page, refer to the [Metrics refere ::::{note} **Don’t see any metrics?** -If you haven’t added data yet, click **Add data** to search for and install an Elastic integration. -Need help getting started? Follow the steps in [Get started with system metrics](/solutions/observability/infra-and-hosts/get-started-with-system-metrics.md). +If you haven’t added data yet, click **Add data → Host** and select how you want to monitor your host—Elastic Agent or OpenTelemetry. + +For more on collecting host data, refer to: +* [OpenTelemetry](opentelemetry://reference/edot-collector/config/configure-metrics-collection.md#process-metrics) +* [Elastic System Integration](integration-docs://reference/system.md) :::: @@ -152,6 +155,10 @@ To learn more about creating and managing rules, refer to [Alerting](/solutions/ :::: +## Select data collection schema + +The **Schema** selector shows the available data collection schemas for the current query. If both Elastic System Integration data and OTel data are available, the selector defaults to OTel. Select **Elastic System Integration** from the **Schema** selector to see results in your ECS data for the current query. + ## View host details [view-host-details] @@ -402,4 +409,7 @@ When a host is detected by APM, but is not collecting full metrics (for example, This could mean that the APM agent has not been configured to use the correct host name. Instead, the host name might be the container name or the Kubernetes pod name. -To get the correct host name, you need to set some additional configuration options, specifically `system.kubernetes.node.name` as described in [Kubernetes data](/solutions/observability/apm/managed-intake-service-event-api.md#kubernetes-data). \ No newline at end of file +To get the correct host name, you need to set some additional configuration options, specifically `system.kubernetes.node.name` as described in [Kubernetes data](/solutions/observability/apm/managed-intake-service-event-api.md#kubernetes-data). + +### I don't see all of my host data [observability-analyze-hosts-i-dont-see-all-of-my-host-data] +If you have host data from both the Elastic Systems integration and OpenTelemetry (OTel), the selector defaults to OTel. If you want to see Elastic System Integration data for your current query, select **Elastic System Integration** from the **Schema** selector. \ No newline at end of file diff --git a/solutions/observability/infra-and-hosts/detect-metric-anomalies.md b/solutions/observability/infra-and-hosts/detect-metric-anomalies.md index f07cf5c029..bb8bce80b6 100644 --- a/solutions/observability/infra-and-hosts/detect-metric-anomalies.md +++ b/solutions/observability/infra-and-hosts/detect-metric-anomalies.md @@ -12,17 +12,16 @@ products: # Detect metric anomalies [observability-detect-metric-anomalies] -::::{note} - -**For Observability serverless projects**, the **Editor** role or higher is required to create {{ml}} jobs. To learn more, refer to [Assign user roles and privileges](/deploy-manage/users-roles/cloud-organization/user-roles.md#general-assign-user-roles). - -:::: - - You can create {{ml}} jobs to detect and inspect memory usage and network traffic anomalies for hosts and Kubernetes pods. You can model system memory usage, along with inbound and outbound network traffic across hosts or pods. You can detect unusual increases in memory usage and unusually high inbound or outbound traffic across hosts or pods. +## Prerequisites +To use create ML jobs to detect metric anomalies, you need to meet the follow requirements: + +* **For Observability serverless projects**, the **Editor** role or higher is required to create {{ml}} jobs. To learn more, refer to [Assign user roles and privileges](/deploy-manage/users-roles/cloud-organization/user-roles.md#general-assign-user-roles). +* Metric anomaly detection does not work for OpenTelemetry hosts. + ## Enable {{ml}} jobs for hosts or Kubernetes pods [ml-jobs-hosts] From c5e7a8f013d2d8d22093049d6c6291d4a924fb40 Mon Sep 17 00:00:00 2001 From: Mike Birnstiehl Date: Mon, 25 Aug 2025 18:09:14 -0500 Subject: [PATCH 2/9] update wording --- .../observability/infra-and-hosts/analyze-compare-hosts.md | 6 +++--- .../infra-and-hosts/detect-metric-anomalies.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/solutions/observability/infra-and-hosts/analyze-compare-hosts.md b/solutions/observability/infra-and-hosts/analyze-compare-hosts.md index 3f6069c0f3..1fe7664e30 100644 --- a/solutions/observability/infra-and-hosts/analyze-compare-hosts.md +++ b/solutions/observability/infra-and-hosts/analyze-compare-hosts.md @@ -37,7 +37,7 @@ To learn more about the metrics shown on this page, refer to the [Metrics refere **Don’t see any metrics?** -If you haven’t added data yet, click **Add data → Host** and select how you want to monitor your host—Elastic Agent or OpenTelemetry. +If you haven’t added data yet, click **Add data → Host** and select how you want to monitor your host—OpenTelemetry or Elastic Agent. For more on collecting host data, refer to: * [OpenTelemetry](opentelemetry://reference/edot-collector/config/configure-metrics-collection.md#process-metrics) @@ -157,7 +157,7 @@ To learn more about creating and managing rules, refer to [Alerting](/solutions/ ## Select data collection schema -The **Schema** selector shows the available data collection schemas for the current query. If both Elastic System Integration data and OTel data are available, the selector defaults to OTel. Select **Elastic System Integration** from the **Schema** selector to see results in your ECS data for the current query. +The **Schema** selector shows the available data collection schemas for the current query. If host data from both the Elastic System Integration and OpenTelemetry are available, the selector defaults to **OpenTelemetry**. Select **Elastic System Integration** to see host data collected using the Elastic System Integration. @@ -412,4 +412,4 @@ This could mean that the APM agent has not been configured to use the correct ho To get the correct host name, you need to set some additional configuration options, specifically `system.kubernetes.node.name` as described in [Kubernetes data](/solutions/observability/apm/managed-intake-service-event-api.md#kubernetes-data). ### I don't see all of my host data [observability-analyze-hosts-i-dont-see-all-of-my-host-data] -If you have host data from both the Elastic Systems integration and OpenTelemetry (OTel), the selector defaults to OTel. If you want to see Elastic System Integration data for your current query, select **Elastic System Integration** from the **Schema** selector. \ No newline at end of file +If you have host data from both the Elastic System Integration and OpenTelemetry (OTel), the selector defaults to OTel. If you want to see Elastic System Integration data for your current query, select **Elastic System Integration** from the **Schema** selector. \ No newline at end of file diff --git a/solutions/observability/infra-and-hosts/detect-metric-anomalies.md b/solutions/observability/infra-and-hosts/detect-metric-anomalies.md index bb8bce80b6..bf4ac2bc49 100644 --- a/solutions/observability/infra-and-hosts/detect-metric-anomalies.md +++ b/solutions/observability/infra-and-hosts/detect-metric-anomalies.md @@ -17,7 +17,7 @@ You can create {{ml}} jobs to detect and inspect memory usage and network traffi You can model system memory usage, along with inbound and outbound network traffic across hosts or pods. You can detect unusual increases in memory usage and unusually high inbound or outbound traffic across hosts or pods. ## Prerequisites -To use create ML jobs to detect metric anomalies, you need to meet the follow requirements: +To create ML jobs to detect metric anomalies, you need to meet the following requirements: * **For Observability serverless projects**, the **Editor** role or higher is required to create {{ml}} jobs. To learn more, refer to [Assign user roles and privileges](/deploy-manage/users-roles/cloud-organization/user-roles.md#general-assign-user-roles). * Metric anomaly detection does not work for OpenTelemetry hosts. From 15884d151bdd66b915bf59131ae7b59cfb0d849f Mon Sep 17 00:00:00 2001 From: Mike Birnstiehl Date: Tue, 26 Aug 2025 16:50:06 -0500 Subject: [PATCH 3/9] review updates --- .../observability-host-metrics.md | 4 +++ .../create-an-inventory-rule.md | 7 +++--- .../infra-and-hosts/analyze-compare-hosts.md | 25 +++++++++++++------ .../detect-metric-anomalies.md | 2 +- ...infrastructure-metrics-by-resource-type.md | 7 ++++-- 5 files changed, 32 insertions(+), 13 deletions(-) diff --git a/reference/data-analysis/observability/observability-host-metrics.md b/reference/data-analysis/observability/observability-host-metrics.md index 1077dfeac8..8e8837d4d8 100644 --- a/reference/data-analysis/observability/observability-host-metrics.md +++ b/reference/data-analysis/observability/observability-host-metrics.md @@ -17,6 +17,7 @@ Learn about key host metrics displayed in the Infrastructure UI: * [Log](#key-metrics-log) * [Network](#key-metrics-network) * [Disk](#key-metrics-network) +* [OpenTelemetry](#key-metrics-opentelemetry) * [Legacy](#legacy-metrics) @@ -84,6 +85,9 @@ Learn about key host metrics displayed in the Infrastructure UI: | **Disk Write IOPS** | Average count of write operations from the device per second.

**Field Calculation**: `counter_rate(max(system.diskio.write.count), kql='system.diskio.write.count: *')`
| | **Disk Write Throughput** | Average number of bytes written from the device per second.

**Field Calculation**: `counter_rate(max(system.diskio.write.bytes), kql='system.diskio.write.bytes: *')`
| +## OpenTelemetry metrics [opentelemetry-metrics] +| Metric | Description | +| --- | --- | ## Legacy metrics [legacy-metrics] diff --git a/solutions/observability/incident-management/create-an-inventory-rule.md b/solutions/observability/incident-management/create-an-inventory-rule.md index 4a08604935..394753d177 100644 --- a/solutions/observability/incident-management/create-an-inventory-rule.md +++ b/solutions/observability/incident-management/create-an-inventory-rule.md @@ -35,16 +35,17 @@ When you select **Create inventory alert**, the parameters you configured on the Conditions for each rule can be applied to specific metrics relating to the inventory type you select. You can choose the aggregation type, the metric, and by including a warning threshold value, you can be alerted on multiple threshold values based on severity scores. When creating the rule, you can still get notified if no data is returned for the specific metric or if the rule fails to query {{es}}. -When creating a rule for `Hosts`, you also need to select a data collection schema in the **Schema** field. Select **Elastic System Integration** for host data collected using the Elastic System Integration or **OpenTelemetry** for host data collected using OpenTelemetry. +:::{note} +Most inventory types respect the default data collection method (for example, [Elastic system integration](integration-docs://reference/system/index.md)). For the `Hosts` inventory type, however, you can use the **Schema** dropdown menu to explicitly target host data collected using **OpenTelemetry** or the **Elastic System Integration**. +::: -In this example, Kubernetes Pods is the selected inventory type. The conditions state that you will receive a critical alert for any pods within the `ingress-nginx` namespace with a memory usage of 95% or above and a warning alert if memory usage is 90% or above. The chart shows the results of applying the rule to the last 20 minutes of data. Note that the chart time range is 20 times the value of the look-back window specified in the `FOR THE LAST` field. +In the following example, Kubernetes Pods is the selected inventory type. The conditions state that you will receive a critical alert for any pods within the `ingress-nginx` namespace with a memory usage of 95% or above and a warning alert if memory usage is 90% or above. The chart shows the results of applying the rule to the last 20 minutes of data. Note that the chart time range is 20 times the value of the look-back window specified in the `FOR THE LAST` field. :::{image} /solutions/images/serverless-inventory-alert.png :alt: Inventory rule :screenshot: ::: - ## Add actions [action-types-infrastructure] You can extend your rules with actions that interact with third-party systems, write to logs or indices, or send user notifications. You can add an action to a rule at any time. You can create rules without adding actions, and you can also define multiple actions for a single rule. diff --git a/solutions/observability/infra-and-hosts/analyze-compare-hosts.md b/solutions/observability/infra-and-hosts/analyze-compare-hosts.md index 1fe7664e30..5a596d5678 100644 --- a/solutions/observability/infra-and-hosts/analyze-compare-hosts.md +++ b/solutions/observability/infra-and-hosts/analyze-compare-hosts.md @@ -41,7 +41,7 @@ If you haven’t added data yet, click **Add data → Host** and select how you For more on collecting host data, refer to: * [OpenTelemetry](opentelemetry://reference/edot-collector/config/configure-metrics-collection.md#process-metrics) -* [Elastic System Integration](integration-docs://reference/system.md) +* [Elastic System integration](integration-docs://reference/system.md) :::: @@ -155,22 +155,25 @@ To learn more about creating and managing rules, refer to [Alerting](/solutions/ :::: -## Select data collection schema - -The **Schema** selector shows the available data collection schemas for the current query. If host data from both the Elastic System Integration and OpenTelemetry are available, the selector defaults to **OpenTelemetry**. Select **Elastic System Integration** to see host data collected using the Elastic System Integration. +## Select the data collection schema[host-schema-selector] +The **Schema** menu shows the available data collection schemas for the current query. If host data from both the Elastic System integration and OpenTelemetry are available, the selector defaults to **OpenTelemetry**. Select **Elastic System Integration** to see host data collected by the Elastic System integration. ## View host details [view-host-details] Without leaving the **Hosts** page, you can view enhanced metrics relating to each host running in your infrastructure. In the list of hosts, find the host you want to monitor, then click the **Toggle dialog with details** icon ![expand icon](/solutions/images/serverless-expand-icon.png "") to display the host details overlay. +The host details overlay adapts according to the [selected schema](#host-schema-selector). When viewing host data collected using OpenTelemetry, you see the following differences: + +* Anomaly detection isn't available for OpenTelemetry hosts, so there is no **Anomalies** tab. +* The Lens charts use the OpenTelemetry formulas //LINK TO THE Host Metrics Ref + ::::{tip} To expand the overlay and view more detail, click **Open as page** in the upper-right corner. :::: - The host details overlay contains the following tabs: :::::{dropdown} Overview @@ -217,6 +220,10 @@ The **Metrics** tab shows host metrics organized by type and is more complete th :::::{dropdown} Processes +:::{note} +To view processes for OpenTelemetry hosts, you need to configure the EDOT collector to send process metrics. Refer to [Process metrics](opentelemetry://reference/edot-collector/config/configure-metrics-collection.md#process-metrics) for more information. +::: + The **Processes** tab lists the total number of processes (`system.process.summary.total`) running on the host, along with the total number of processes in these various states: * Running (`system.process.summary.running`) @@ -285,6 +292,10 @@ To view the logs in the {{logs-app}} for a detailed analysis, click **Open in Lo :::::{dropdown} Anomalies +:::{note} +Anomaly detection isn't available for OpenTelemetry hosts. When the **Schema** is set to OpenTelemetry, this isn't available. +::: + The **Anomalies** tab displays a list of each single metric {{anomaly-detect}} job for the specific host. By default, anomaly jobs are sorted by time, showing the most recent jobs first. Along with the name of each anomaly job, detected anomalies with a severity score equal to 50 or higher are listed. These scores represent a severity of "warning" or higher in the selected time period. The **summary** value represents the increase between the actual value and the expected ("typical") value of the host metric in the anomaly record result. @@ -402,7 +413,7 @@ This missing data can be hard to spot at first glance. The green boxes outline r In the Hosts view, you might see a question mark icon (![Question mark icon](/solutions/images/serverless-questionInCircle.svg "")) before a host name with a tooltip note stating that the host has been detected by APM. -When a host is detected by APM, but is not collecting full metrics (for example, through the [system integration](integration-docs://reference/system/index.md)), it will be listed as a host with the partial metrics collected by APM. +When a host is detected by APM, but is not collecting full metrics (for example, through the [Elastic System integration](integration-docs://reference/system/index.md)), it will be listed as a host with the partial metrics collected by APM. ### I don’t recognize a host name and I see a question mark icon next to it [observability-analyze-hosts-i-dont-recognize-a-host-name-and-i-see-a-question-mark-icon-next-to-it] @@ -412,4 +423,4 @@ This could mean that the APM agent has not been configured to use the correct ho To get the correct host name, you need to set some additional configuration options, specifically `system.kubernetes.node.name` as described in [Kubernetes data](/solutions/observability/apm/managed-intake-service-event-api.md#kubernetes-data). ### I don't see all of my host data [observability-analyze-hosts-i-dont-see-all-of-my-host-data] -If you have host data from both the Elastic System Integration and OpenTelemetry (OTel), the selector defaults to OTel. If you want to see Elastic System Integration data for your current query, select **Elastic System Integration** from the **Schema** selector. \ No newline at end of file +If you have host data from both the Elastic System integration and OpenTelemetry (OTel), the selector defaults to OTel. If you want to see Elastic System integration data for your current query, select **Elastic System Integration** from the **Schema** selector. \ No newline at end of file diff --git a/solutions/observability/infra-and-hosts/detect-metric-anomalies.md b/solutions/observability/infra-and-hosts/detect-metric-anomalies.md index bf4ac2bc49..1bbc71f944 100644 --- a/solutions/observability/infra-and-hosts/detect-metric-anomalies.md +++ b/solutions/observability/infra-and-hosts/detect-metric-anomalies.md @@ -20,7 +20,7 @@ You can model system memory usage, along with inbound and outbound network traff To create ML jobs to detect metric anomalies, you need to meet the following requirements: * **For Observability serverless projects**, the **Editor** role or higher is required to create {{ml}} jobs. To learn more, refer to [Assign user roles and privileges](/deploy-manage/users-roles/cloud-organization/user-roles.md#general-assign-user-roles). -* Metric anomaly detection does not work for OpenTelemetry hosts. +* Metric anomaly detection isn't available for OpenTelemetry hosts. ## Enable {{ml}} jobs for hosts or Kubernetes pods [ml-jobs-hosts] diff --git a/solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md b/solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md index ce3a3e4195..1a90d0f651 100644 --- a/solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md +++ b/solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md @@ -57,19 +57,22 @@ You can also use the search bar to create structured queries using [{{kib}} Quer To examine the metrics for a specific time, use the time filter to select the date and time. - ## View host metrics [analyze-hosts-inventory] By default the **Infrastructure Inventory** page displays a waffle map that shows the hosts you are monitoring and the current CPU usage for each host. Alternatively, you can click the **Table view** icon ![table view icon](/solutions/images/observability-table-view-icon.png "") to switch to a table view. Without leaving the **Infrastructure Inventory** page, you can view enhanced metrics relating to each host running in your infrastructure. On the waffle map, select a host to display the host details overlay. +::::{note} +When showing `Hosts`, the **Schema** dropdown menu shows the available data collection schemas for the current query. If data from both the Elastic System integration and OpenTelemetry are available, the schema defaults to **OpenTelemetry**. Select **Elastic System Integration** to see data collected by the Elastic System integration. +:::: + + ::::{tip} To expand the overlay and view more detail, click **Open as page** in the upper-right corner. :::: - The host details overlay contains the following tabs: :::::{dropdown} Overview From e9ba599f75a1ff883c9a47d081a4491a2fd5c0f9 Mon Sep 17 00:00:00 2001 From: Mike Birnstiehl Date: Tue, 26 Aug 2025 16:54:26 -0500 Subject: [PATCH 4/9] fix link --- .../data-analysis/observability/observability-host-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/data-analysis/observability/observability-host-metrics.md b/reference/data-analysis/observability/observability-host-metrics.md index 8e8837d4d8..5fc057388f 100644 --- a/reference/data-analysis/observability/observability-host-metrics.md +++ b/reference/data-analysis/observability/observability-host-metrics.md @@ -85,7 +85,7 @@ Learn about key host metrics displayed in the Infrastructure UI: | **Disk Write IOPS** | Average count of write operations from the device per second.

**Field Calculation**: `counter_rate(max(system.diskio.write.count), kql='system.diskio.write.count: *')`
| | **Disk Write Throughput** | Average number of bytes written from the device per second.

**Field Calculation**: `counter_rate(max(system.diskio.write.bytes), kql='system.diskio.write.bytes: *')`
| -## OpenTelemetry metrics [opentelemetry-metrics] +## OpenTelemetry metrics [key-metrics-opentelemetry] | Metric | Description | | --- | --- | From 385e8d522a865b26fc843bcaa7a47c9740226603 Mon Sep 17 00:00:00 2001 From: Mike Birnstiehl Date: Tue, 26 Aug 2025 16:59:21 -0500 Subject: [PATCH 5/9] add link to otel ref --- .../observability/infra-and-hosts/analyze-compare-hosts.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/solutions/observability/infra-and-hosts/analyze-compare-hosts.md b/solutions/observability/infra-and-hosts/analyze-compare-hosts.md index 5a596d5678..01ccbd671e 100644 --- a/solutions/observability/infra-and-hosts/analyze-compare-hosts.md +++ b/solutions/observability/infra-and-hosts/analyze-compare-hosts.md @@ -167,7 +167,7 @@ Without leaving the **Hosts** page, you can view enhanced metrics relating to ea The host details overlay adapts according to the [selected schema](#host-schema-selector). When viewing host data collected using OpenTelemetry, you see the following differences: * Anomaly detection isn't available for OpenTelemetry hosts, so there is no **Anomalies** tab. -* The Lens charts use the OpenTelemetry formulas //LINK TO THE Host Metrics Ref +* The Lens charts use the [OpenTelemetry formulas](/reference/data-analysis/observability/observability-host-metrics.md#key-metrics-opentelemetry). ::::{tip} To expand the overlay and view more detail, click **Open as page** in the upper-right corner. From 552b9f81adab24d6abcb6f2e25c5946e85797ac0 Mon Sep 17 00:00:00 2001 From: Mike Birnstiehl Date: Wed, 27 Aug 2025 14:05:55 -0500 Subject: [PATCH 6/9] review updates --- .../observability-host-metrics.md | 100 +++++++++++++++--- .../infra-and-hosts/analyze-compare-hosts.md | 4 +- ...infrastructure-metrics-by-resource-type.md | 10 +- 3 files changed, 96 insertions(+), 18 deletions(-) diff --git a/reference/data-analysis/observability/observability-host-metrics.md b/reference/data-analysis/observability/observability-host-metrics.md index 5fc057388f..a7a15f22c6 100644 --- a/reference/data-analysis/observability/observability-host-metrics.md +++ b/reference/data-analysis/observability/observability-host-metrics.md @@ -9,7 +9,13 @@ products: # Host metrics [observability-host-metrics] -Learn about key host metrics displayed in the Infrastructure UI: +Learn about key host metrics displayed in the Infrastructure UI. + +* [Elastic System integration host metrics](#ecs-host-metrics) +* [OpenTelemetry host metrics](#open-telemetry-host-metrics) + + +## Elastic system integration host metrics [ecs-host-metrics] * [Hosts](#key-metrics-hosts) * [CPU usage](#key-metrics-cpu) @@ -17,18 +23,16 @@ Learn about key host metrics displayed in the Infrastructure UI: * [Log](#key-metrics-log) * [Network](#key-metrics-network) * [Disk](#key-metrics-network) -* [OpenTelemetry](#key-metrics-opentelemetry) * [Legacy](#legacy-metrics) - -## Hosts metrics [key-metrics-hosts] +### Hosts metrics [key-metrics-hosts] | Metric | Description | | --- | --- | -| **Hosts** | Number of hosts returned by your search criteria.

**Field Calculation**: `count(system.cpu.cores)`
| +| **Hosts** | Number of hosts returned by your search criteria.

**Field Calculation**: `unique_count(host.name)`
| -## CPU usage metrics [key-metrics-cpu] +### CPU usage metrics [key-metrics-cpu] | Metric | Description | | --- | --- | @@ -46,7 +50,7 @@ Learn about key host metrics displayed in the Infrastructure UI: | **Normalized Load** | 1 minute load average normalized by the number of CPU cores.

Load average gives an indication of the number of threads that are runnable (either busy running on CPU, waiting to run, or waiting for a blocking IO operation to complete).

100% means the 1 minute load average is equal to the number of CPU cores of the host.

Taking the example of a 32 CPU cores host, if the 1 minute load average is 32, the value reported here is 100%. If the 1 minute load average is 48, the value reported here is 150%.

**Field Calculation**: `average(system.load.1) / max(system.load.cores)`
| -## Memory metrics [key-metrics-memory] +### Memory metrics [key-metrics-memory] | Metric | Description | | --- | --- | @@ -58,14 +62,14 @@ Learn about key host metrics displayed in the Infrastructure UI: | **Memory Used** | Main memory usage excluding page cache.

**Field Calculation**: `average(system.memory.actual.used.bytes)`
| -## Log metrics [key-metrics-log] +### Log metrics [key-metrics-log] | Metric | Description | | --- | --- | | **Log Rate** | Derivative of the cumulative sum of the document count scaled to a 1 second rate. This metric relies on the same indices as the logs.

**Field Calculation**: `cumulative_sum(doc_count)`
| -## Network metrics [key-metrics-network] +### Network metrics [key-metrics-network] | Metric | Description | | --- | --- | @@ -73,7 +77,7 @@ Learn about key host metrics displayed in the Infrastructure UI: | **Network Outbound (TX)** | Number of bytes that have been sent per second on the public interfaces of the hosts.

**Field Calculation**: `sum(host.network.egress.bytes) * 8 / 1000`

For legacy metric calculations, refer to [Legacy metrics](#legacy-metrics).
| -## Disk metrics [observability-host-metrics-disk-metrics] +### Disk metrics [observability-host-metrics-disk-metrics] | Metric | Description | | --- | --- | @@ -85,11 +89,7 @@ Learn about key host metrics displayed in the Infrastructure UI: | **Disk Write IOPS** | Average count of write operations from the device per second.

**Field Calculation**: `counter_rate(max(system.diskio.write.count), kql='system.diskio.write.count: *')`
| | **Disk Write Throughput** | Average number of bytes written from the device per second.

**Field Calculation**: `counter_rate(max(system.diskio.write.bytes), kql='system.diskio.write.bytes: *')`
| -## OpenTelemetry metrics [key-metrics-opentelemetry] -| Metric | Description | -| --- | --- | - -## Legacy metrics [legacy-metrics] +### Legacy metrics [legacy-metrics] Over time, we may change the formula used to calculate a specific metric. To avoid affecting your existing rules, instead of changing the actual metric definition, we create a new metric and refer to the old one as "legacy." @@ -100,3 +100,73 @@ The UI and any new rules you create will use the new metric definition. However, | **CPU Usage (legacy)** | Percentage of CPU time spent in states other than Idle and IOWait, normalized by the number of CPU cores. This includes both time spent on user space and kernel space. 100% means all CPUs of the host are busy.

**Field Calculation**: `(average(system.cpu.user.pct) + average(system.cpu.system.pct)) / max(system.cpu.cores)`
| | **Network Inbound (RX) (legacy)** | Number of bytes that have been received per second on the public interfaces of the hosts.

**Field Calculation**: `average(host.network.ingress.bytes) * 8 / (max(metricset.period, kql='host.network.ingress.bytes: *') / 1000)`
| | **Network Outbound (TX) (legacy)** | Number of bytes that have been sent per second on the public interfaces of the hosts.

**Field Calculation**: `average(host.network.egress.bytes) * 8 / (max(metricset.period, kql='host.network.egress.bytes: *') / 1000)`
| + +## OpenTelemetry host metrics [open-telemetry-host-metrics] + +* [Hosts](#otel-metrics-hosts) +* [CPU usage](#otel-metrics-cpu) +* [Memory](#otel-metrics-memory) +* [Log](#otel-metrics-log) +* [Network](#otel-metrics-network) +* [Disk](#otel-metrics-network) + +### OpenTelemetry hosts metrics [otel-metrics-hosts] + +| Metric | Description | +| --- | --- | +| **Hosts** | Number of hosts returned by your search criteria.

**Field Calculation**: `unique_count(host.name)`
| + +### OpenTelemetry CPU usage metrics [otel-metrics-cpu] + +| Metric | Description | +| --- | --- | +| **CPU Usage (%)** | Average of percentage of CPU time spent in states other than Idle and IOWait, normalized by the number of CPU cores. Includes both time spent on user space and kernel space. 100% means all CPUs of the host are busy.

**Field Calculation**: `1-(average(metrics.system.cpu.utilization,kql='state: idle') + average(metrics.system.cpu.utilization,kql='state: wait'))`
| +| **CPU Usage - iowait (%)** | The percentage of CPU time spent in wait (on disk).

**Field Calculation**: `average(metrics.system.cpu.utilization,kql='state: wait') / max(metrics.system.cpu.logical.count)`
| +| **CPU Usage - irq (%)** | The percentage of CPU time spent servicing and handling hardware interrupts.

**Field Calculation**: `average(metrics.system.cpu.utilization,kql='state: interrupt') / max(metrics.system.cpu.logical.count)`
| +| **CPU Usage - nice (%)** | The percentage of CPU time spent on low-priority processes.

**Field Calculation**: `average(metrics.system.cpu.utilization,kql='state: nice') / max(metrics.system.cpu.logical.count)`
| +| **CPU Usage - softirq (%)** | The percentage of CPU time spent servicing and handling software interrupts.

**Field Calculation**: `average(metrics.system.cpu.utilization,kql='state: softirq') / max(metrics.system.cpu.logical.count)`
| +| **CPU Usage - steal (%)** | The percentage of CPU time spent in involuntary wait by the virtual CPU while the hypervisor was servicing another processor. Available only on Unix.

**Field Calculation**: `average(metrics.system.cpu.utilization,kql='state: steal') / max(metrics.system.cpu.logical.count)`
| +| **CPU Usage - system (%)** | The percentage of CPU time spent in kernel space.

**Field Calculation**: `average(metrics.system.cpu.utilization,kql='state: system') / max(metrics.system.cpu.logical.count)`
| +| **CPU Usage - user (%)** | The percentage of CPU time spent in user space. On multi-core systems, you can have percentages that are greater than 100%. For example, if 3 cores are at 60% use, then the system.cpu.user.pct will be 180%.

**Field Calculation**: `average(metrics.system.cpu.utilization,kql='state: user') / max(metrics.system.cpu.logical.count)`
| +| **Load (1m)** | 1 minute load average.

Load average gives an indication of the number of threads that are runnable (either busy running on CPU, waiting to run, or waiting for a blocking IO operation to complete).

**Field Calculation**: `average(metrics.system.cpu.load_average.1m)`
| +| **Load (5m)** | 5 minute load average.

Load average gives an indication of the number of threads that are runnable (either busy running on CPU, waiting to run, or waiting for a blocking IO operation to complete).

**Field Calculation**: `average(metrics.system.cpu.load_average.5m)`
| +| **Load (15m)** | 15 minute load average.

Load average gives an indication of the number of threads that are runnable (either busy running on CPU, waiting to run, or waiting for a blocking IO operation to complete).

**Field Calculation**: `average(metrics.system.cpu.load_average.15m)`
| +| **Normalized Load** | 1 minute load average normalized by the number of CPU cores.

Load average gives an indication of the number of threads that are runnable (either busy running on CPU, waiting to run, or waiting for a blocking IO operation to complete).

100% means the 1 minute load average is equal to the number of CPU cores of the host.

Taking the example of a 32 CPU cores host, if the 1 minute load average is 32, the value reported here is 100%. If the 1 minute load average is 48, the value reported here is 150%.

**Field Calculation**: `average(metrics.system.cpu.load_average.1m) / max(metrics.system.cpu.logical.count)`
| + +### OpenTelemetry memory metrics [otel-metrics-memory] + +| Metric | Description | +| --- | --- | +| **Memory Cache** | Memory (page) cache.

**Field Calculation**: `average(metrics.system.memory.usage, kql='state: cache') / average(metrics.system.memory.usage, kql='state: slab_reclaimable') + average(metrics.system.memory.usage, kql='state: slab_unreclaimable')`
| +| **Memory Free** | Total available memory.

**Field Calculation**: `(max(metrics.system.memory.usage, kql='state: free') + max(metrics.system.memory.usage, kql='state: cached')) - (average(metrics.system.memory.usage, kql='state: slab_unreclaimable') + average(metrics.system.memory.usage, kql='state: slab_reclaimable'))`
| +| **Memory Free (excluding cache)** | Total available memory excluding the page cache.

**Field Calculation**: `average(metrics.system.memory.usage, kql='state: free')`
| +| **Memory Total** | Total memory capacity.

**Field Calculation**: `avg(system.memory.total)`
| +| **Memory Usage (%)** | Percentage of main memory usage excluding page cache.

This includes resident memory for all processes plus memory used by the kernel structures and code apart from the page cache.

A high level indicates a situation of memory saturation for the host. For example, 100% means the main memory is entirely filled with memory that can’t be reclaimed, except by swapping out.

**Field Calculation**: `average(system.memory.utilization, kql='state: used') + average(system.memory.utilization, kql='state: buffered') + average(system.memory.utilization, kql='state: slab_reclaimable') + average(system.memory.utilization, kql='state: slab_unreclaimable')`
| +| **Memory Used** | Main memory usage excluding page cache.

**Field Calculation**: `average(metrics.system.memory.usage, kql='state: used') + average(metrics.system.memory.usage, kql='state: buffered') + average(metrics.system.memory.usage, kql='state: slab_reclaimable') + average(metrics.system.memory.usage, kql='state: slab_unreclaimable')`
| + +### OpenTelemetry log metrics [otel-metrics-log] + +| Metric | Description | +| --- | --- | +| **Log Rate** | Derivative of the cumulative sum of the document count scaled to a 1 second rate. This metric relies on the same indices as the logs.

**Field Calculation**: `cumulative_sum(doc_count)`
| + +### OpenTelemetry network metrics [otel-metrics-network] + +| Metric | Description | +| --- | --- | +| **Network Inbound (RX)** | Number of bytes that have been received per second on the public interfaces of the hosts.

**Field Calculation**: `8 * counter_rate(max(metrics.system.network.io, kql='direction: receive')))`
| +| **Network Outbound (TX)** | Number of bytes that have been sent per second on the public interfaces of the hosts.

**Field Calculation**: `8 * counter_rate(max(metrics.system.network.io, kql='direction: transmit'))`
| + +### OpenTelemetry disk metrics [otel-metrics-disk] + +| Metric | Description | +| --- | --- | +| **Disk Latency** | Time spent to service disk requests.

**Field Calculation**: `average(system.diskio.read.time + system.diskio.write.time) / (system.diskio.read.count + system.diskio.write.count)`
| +| **Disk Read IOPS** | Average count of read operations from the device per second.

**Field Calculation**: `counter_rate(max(system.disk.operations, kql='attributes.direction: read'))`
| +| **Disk Read Throughput** | Average number of bytes read from the device per second.

**Field Calculation**: `counter_rate(max(system.disk.io, kql='attributes.direction: read'))`
| +| **Disk Usage - Available (%)** | Percentage of disk space available.

**Field Calculation**: `average(system.filesystem.usage, kql='state: free')`
| +| **Disk Usage - Used (%)** | Percentage of disk space used.

**Field Calculation**: `1 - sum(metrics.system.filesystem.usage, kql='state: free') / sum(metrics.system.filesystem.usage)`
| +| **Disk Write IOPS** | Average count of write operations from the device per second.

**Field Calculation**: `counter_rate(max(system.disk.operations, kql='attributes.direction: write'))`
| +| **Disk Write Throughput** | Average number of bytes written from the device per second.

**Field Calculation**: `counter_rate(max(system.disk.io, kql='attributes.direction: write'))')`
| + + diff --git a/solutions/observability/infra-and-hosts/analyze-compare-hosts.md b/solutions/observability/infra-and-hosts/analyze-compare-hosts.md index 01ccbd671e..938640b82b 100644 --- a/solutions/observability/infra-and-hosts/analyze-compare-hosts.md +++ b/solutions/observability/infra-and-hosts/analyze-compare-hosts.md @@ -157,7 +157,7 @@ To learn more about creating and managing rules, refer to [Alerting](/solutions/ ## Select the data collection schema[host-schema-selector] -The **Schema** menu shows the available data collection schemas for the current query. If host data from both the Elastic System integration and OpenTelemetry are available, the selector defaults to **OpenTelemetry**. Select **Elastic System Integration** to see host data collected by the Elastic System integration. +The **Schema** menu shows the available data collection schemas for the current query. If host data from both the Elastic System integration and OpenTelemetry is available, the selector defaults to **OpenTelemetry**. Select **Elastic System Integration** to see host data collected by the Elastic System integration. ## View host details [view-host-details] @@ -293,7 +293,7 @@ To view the logs in the {{logs-app}} for a detailed analysis, click **Open in Lo :::::{dropdown} Anomalies :::{note} -Anomaly detection isn't available for OpenTelemetry hosts. When the **Schema** is set to OpenTelemetry, this isn't available. +Anomaly detection isn't available for OpenTelemetry hosts. When the **Schema** is set to OpenTelemetry, this tab isn't available. ::: The **Anomalies** tab displays a list of each single metric {{anomaly-detect}} job for the specific host. By default, anomaly jobs are sorted by time, showing the most recent jobs first. diff --git a/solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md b/solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md index 1a90d0f651..8ae7bd82df 100644 --- a/solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md +++ b/solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md @@ -64,7 +64,7 @@ By default the **Infrastructure Inventory** page displays a waffle map that show Without leaving the **Infrastructure Inventory** page, you can view enhanced metrics relating to each host running in your infrastructure. On the waffle map, select a host to display the host details overlay. ::::{note} -When showing `Hosts`, the **Schema** dropdown menu shows the available data collection schemas for the current query. If data from both the Elastic System integration and OpenTelemetry are available, the schema defaults to **OpenTelemetry**. Select **Elastic System Integration** to see data collected by the Elastic System integration. +When showing `Hosts`, the **Schema** dropdown menu shows the available data collection schemas for the current query. If data from both the Elastic System integration and OpenTelemetry is available, the schema defaults to **OpenTelemetry**. Select **Elastic System Integration** to see data collected by the Elastic System integration. :::: @@ -119,6 +119,10 @@ The **Metrics** tab shows host metrics organized by type and is more complete th :::::{dropdown} Processes +:::{note} +To view processes for OpenTelemetry hosts, you need to configure the EDOT collector to send process metrics. Refer to [Process metrics](opentelemetry://reference/edot-collector/config/configure-metrics-collection.md#process-metrics) for more information. +::: + The **Processes** tab lists the total number of processes (`system.process.summary.total`) running on the host, along with the total number of processes in these various states: * Running (`system.process.summary.running`) @@ -187,6 +191,10 @@ To view the logs in the {{logs-app}} for a detailed analysis, click **Open in Lo :::::{dropdown} Anomalies +:::{note} +Anomaly detection isn't available for OpenTelemetry hosts. When the **Schema** is set to OpenTelemetry, this tab isn't available. +::: + The **Anomalies** tab displays a list of each single metric {{anomaly-detect}} job for the specific host. By default, anomaly jobs are sorted by time, showing the most recent jobs first. Along with the name of each anomaly job, detected anomalies with a severity score equal to 50 or higher are listed. These scores represent a severity of "warning" or higher in the selected time period. The **summary** value represents the increase between the actual value and the expected ("typical") value of the host metric in the anomaly record result. From d0efe6391fc4f130d846a74402e7386d7644bece Mon Sep 17 00:00:00 2001 From: Mike Birnstiehl Date: Wed, 27 Aug 2025 14:20:21 -0500 Subject: [PATCH 7/9] fix links --- .../observability/observability-host-metrics.md | 8 ++++++-- .../infra-and-hosts/analyze-compare-hosts.md | 2 +- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/reference/data-analysis/observability/observability-host-metrics.md b/reference/data-analysis/observability/observability-host-metrics.md index a7a15f22c6..97f69837a0 100644 --- a/reference/data-analysis/observability/observability-host-metrics.md +++ b/reference/data-analysis/observability/observability-host-metrics.md @@ -9,13 +9,15 @@ products: # Host metrics [observability-host-metrics] -Learn about key host metrics displayed in the Infrastructure UI. +Learn about key host metrics displayed in the Infrastructure UI: * [Elastic System integration host metrics](#ecs-host-metrics) * [OpenTelemetry host metrics](#open-telemetry-host-metrics) -## Elastic system integration host metrics [ecs-host-metrics] +## Elastic System integration host metrics [ecs-host-metrics] + +Refer to the following sections for host metrics and field calculation formulas for the Elastic System integration data: * [Hosts](#key-metrics-hosts) * [CPU usage](#key-metrics-cpu) @@ -103,6 +105,8 @@ The UI and any new rules you create will use the new metric definition. However, ## OpenTelemetry host metrics [open-telemetry-host-metrics] +Refer to the following sections for host metrics and field calculation formulas for OpenTelemetry data: + * [Hosts](#otel-metrics-hosts) * [CPU usage](#otel-metrics-cpu) * [Memory](#otel-metrics-memory) diff --git a/solutions/observability/infra-and-hosts/analyze-compare-hosts.md b/solutions/observability/infra-and-hosts/analyze-compare-hosts.md index 938640b82b..2041af07cc 100644 --- a/solutions/observability/infra-and-hosts/analyze-compare-hosts.md +++ b/solutions/observability/infra-and-hosts/analyze-compare-hosts.md @@ -167,7 +167,7 @@ Without leaving the **Hosts** page, you can view enhanced metrics relating to ea The host details overlay adapts according to the [selected schema](#host-schema-selector). When viewing host data collected using OpenTelemetry, you see the following differences: * Anomaly detection isn't available for OpenTelemetry hosts, so there is no **Anomalies** tab. -* The Lens charts use the [OpenTelemetry formulas](/reference/data-analysis/observability/observability-host-metrics.md#key-metrics-opentelemetry). +* The Lens charts use the [OpenTelemetry field calculation formulas](/reference/data-analysis/observability/observability-host-metrics.md#open-telemetry-host-metrics). ::::{tip} To expand the overlay and view more detail, click **Open as page** in the upper-right corner. From 911828d108f5a6adceba3768f3a00ee6f67d062b Mon Sep 17 00:00:00 2001 From: Mike Birnstiehl Date: Tue, 2 Sep 2025 09:42:00 -0500 Subject: [PATCH 8/9] add applies to tags --- .../create-an-inventory-rule.md | 1 + .../infra-and-hosts/analyze-compare-hosts.md | 16 +++++++++++++--- ...ew-infrastructure-metrics-by-resource-type.md | 3 +++ 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/solutions/observability/incident-management/create-an-inventory-rule.md b/solutions/observability/incident-management/create-an-inventory-rule.md index 394753d177..7a5ad53ffa 100644 --- a/solutions/observability/incident-management/create-an-inventory-rule.md +++ b/solutions/observability/incident-management/create-an-inventory-rule.md @@ -36,6 +36,7 @@ When you select **Create inventory alert**, the parameters you configured on the Conditions for each rule can be applied to specific metrics relating to the inventory type you select. You can choose the aggregation type, the metric, and by including a warning threshold value, you can be alerted on multiple threshold values based on severity scores. When creating the rule, you can still get notified if no data is returned for the specific metric or if the rule fails to query {{es}}. :::{note} +{applies_to}`{stack: "ga 9.2", serverless: "ga"}` Most inventory types respect the default data collection method (for example, [Elastic system integration](integration-docs://reference/system/index.md)). For the `Hosts` inventory type, however, you can use the **Schema** dropdown menu to explicitly target host data collected using **OpenTelemetry** or the **Elastic System Integration**. ::: diff --git a/solutions/observability/infra-and-hosts/analyze-compare-hosts.md b/solutions/observability/infra-and-hosts/analyze-compare-hosts.md index 2041af07cc..5882bdb460 100644 --- a/solutions/observability/infra-and-hosts/analyze-compare-hosts.md +++ b/solutions/observability/infra-and-hosts/analyze-compare-hosts.md @@ -156,15 +156,19 @@ To learn more about creating and managing rules, refer to [Alerting](/solutions/ :::: ## Select the data collection schema[host-schema-selector] +```{applies_to} +stack: ga 9.2 +serverless: ga +``` -The **Schema** menu shows the available data collection schemas for the current query. If host data from both the Elastic System integration and OpenTelemetry is available, the selector defaults to **OpenTelemetry**. Select **Elastic System Integration** to see host data collected by the Elastic System integration. +The **Schema** menu shows the available data collection schemas for the current query. If host data from both the Elastic System integration and OpenTelemetry is available, the schema defaults to **OpenTelemetry**. Select **Elastic System Integration** to see host data collected by the Elastic System integration. ## View host details [view-host-details] Without leaving the **Hosts** page, you can view enhanced metrics relating to each host running in your infrastructure. In the list of hosts, find the host you want to monitor, then click the **Toggle dialog with details** icon ![expand icon](/solutions/images/serverless-expand-icon.png "") to display the host details overlay. -The host details overlay adapts according to the [selected schema](#host-schema-selector). When viewing host data collected using OpenTelemetry, you see the following differences: +{applies_to}`{stack: "ga 9.2", serverless: "ga"}` The host details overlay adapts according to the [selected schema](#host-schema-selector). When viewing host data collected using OpenTelemetry, you see the following differences: * Anomaly detection isn't available for OpenTelemetry hosts, so there is no **Anomalies** tab. * The Lens charts use the [OpenTelemetry field calculation formulas](/reference/data-analysis/observability/observability-host-metrics.md#open-telemetry-host-metrics). @@ -221,6 +225,7 @@ The **Metrics** tab shows host metrics organized by type and is more complete th :::::{dropdown} Processes :::{note} +{applies_to}`{stack: "ga 9.2", serverless: "ga"}` To view processes for OpenTelemetry hosts, you need to configure the EDOT collector to send process metrics. Refer to [Process metrics](opentelemetry://reference/edot-collector/config/configure-metrics-collection.md#process-metrics) for more information. ::: @@ -293,7 +298,7 @@ To view the logs in the {{logs-app}} for a detailed analysis, click **Open in Lo :::::{dropdown} Anomalies :::{note} -Anomaly detection isn't available for OpenTelemetry hosts. When the **Schema** is set to OpenTelemetry, this tab isn't available. +{applies_to}`{stack: "ga 9.2", serverless: "ga"}` Anomaly detection isn't available for OpenTelemetry hosts. When the **Schema** is set to OpenTelemetry, this tab isn't available. ::: The **Anomalies** tab displays a list of each single metric {{anomaly-detect}} job for the specific host. By default, anomaly jobs are sorted by time, showing the most recent jobs first. @@ -423,4 +428,9 @@ This could mean that the APM agent has not been configured to use the correct ho To get the correct host name, you need to set some additional configuration options, specifically `system.kubernetes.node.name` as described in [Kubernetes data](/solutions/observability/apm/managed-intake-service-event-api.md#kubernetes-data). ### I don't see all of my host data [observability-analyze-hosts-i-dont-see-all-of-my-host-data] +```{applies_to} +stack: ga 9.2 +serverless: ga +``` + If you have host data from both the Elastic System integration and OpenTelemetry (OTel), the selector defaults to OTel. If you want to see Elastic System integration data for your current query, select **Elastic System Integration** from the **Schema** selector. \ No newline at end of file diff --git a/solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md b/solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md index 8ae7bd82df..5d4cca7b75 100644 --- a/solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md +++ b/solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md @@ -64,6 +64,7 @@ By default the **Infrastructure Inventory** page displays a waffle map that show Without leaving the **Infrastructure Inventory** page, you can view enhanced metrics relating to each host running in your infrastructure. On the waffle map, select a host to display the host details overlay. ::::{note} +{applies_to}`{stack: "ga 9.2", serverless: "ga"}` When showing `Hosts`, the **Schema** dropdown menu shows the available data collection schemas for the current query. If data from both the Elastic System integration and OpenTelemetry is available, the schema defaults to **OpenTelemetry**. Select **Elastic System Integration** to see data collected by the Elastic System integration. :::: @@ -120,6 +121,7 @@ The **Metrics** tab shows host metrics organized by type and is more complete th :::::{dropdown} Processes :::{note} +{applies_to}`{stack: "ga 9.2", serverless: "ga"}` To view processes for OpenTelemetry hosts, you need to configure the EDOT collector to send process metrics. Refer to [Process metrics](opentelemetry://reference/edot-collector/config/configure-metrics-collection.md#process-metrics) for more information. ::: @@ -192,6 +194,7 @@ To view the logs in the {{logs-app}} for a detailed analysis, click **Open in Lo :::::{dropdown} Anomalies :::{note} +{applies_to}`{stack: "ga 9.2", serverless: "ga"}` Anomaly detection isn't available for OpenTelemetry hosts. When the **Schema** is set to OpenTelemetry, this tab isn't available. ::: From 08ac259b6839d2bcfb8f758239c22c3975704c63 Mon Sep 17 00:00:00 2001 From: Mike Birnstiehl <114418652+mdbirnstiehl@users.noreply.github.com> Date: Tue, 2 Sep 2025 20:05:59 -0500 Subject: [PATCH 9/9] Apply suggestions from code review Co-authored-by: Nastasha Solomon <79124755+nastasha-solomon@users.noreply.github.com> --- .../data-analysis/observability/observability-host-metrics.md | 2 +- .../view-infrastructure-metrics-by-resource-type.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/reference/data-analysis/observability/observability-host-metrics.md b/reference/data-analysis/observability/observability-host-metrics.md index 97f69837a0..374ee22621 100644 --- a/reference/data-analysis/observability/observability-host-metrics.md +++ b/reference/data-analysis/observability/observability-host-metrics.md @@ -124,7 +124,7 @@ Refer to the following sections for host metrics and field calculation formulas | Metric | Description | | --- | --- | -| **CPU Usage (%)** | Average of percentage of CPU time spent in states other than Idle and IOWait, normalized by the number of CPU cores. Includes both time spent on user space and kernel space. 100% means all CPUs of the host are busy.

**Field Calculation**: `1-(average(metrics.system.cpu.utilization,kql='state: idle') + average(metrics.system.cpu.utilization,kql='state: wait'))`
| +| **CPU Usage (%)** | Average percentage of CPU time spent in states other than Idle and IOWait, normalized by the number of CPU cores. Includes both time spent on user space and kernel space. 100% means all CPUs of the host are busy.

**Field Calculation**: `1-(average(metrics.system.cpu.utilization,kql='state: idle') + average(metrics.system.cpu.utilization,kql='state: wait'))`
| | **CPU Usage - iowait (%)** | The percentage of CPU time spent in wait (on disk).

**Field Calculation**: `average(metrics.system.cpu.utilization,kql='state: wait') / max(metrics.system.cpu.logical.count)`
| | **CPU Usage - irq (%)** | The percentage of CPU time spent servicing and handling hardware interrupts.

**Field Calculation**: `average(metrics.system.cpu.utilization,kql='state: interrupt') / max(metrics.system.cpu.logical.count)`
| | **CPU Usage - nice (%)** | The percentage of CPU time spent on low-priority processes.

**Field Calculation**: `average(metrics.system.cpu.utilization,kql='state: nice') / max(metrics.system.cpu.logical.count)`
| diff --git a/solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md b/solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md index 5d4cca7b75..bb2ec50767 100644 --- a/solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md +++ b/solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md @@ -65,7 +65,7 @@ Without leaving the **Infrastructure Inventory** page, you can view enhanced met ::::{note} {applies_to}`{stack: "ga 9.2", serverless: "ga"}` -When showing `Hosts`, the **Schema** dropdown menu shows the available data collection schemas for the current query. If data from both the Elastic System integration and OpenTelemetry is available, the schema defaults to **OpenTelemetry**. Select **Elastic System Integration** to see data collected by the Elastic System integration. +When showing **Hosts**, the **Schema** dropdown menu shows the available data collection schemas for the current query. If data from both the Elastic System integration and OpenTelemetry is available, the schema defaults to **OpenTelemetry**. Select **Elastic System Integration** to see data collected by the Elastic System integration. ::::