Skip to content

Commit 1e9b372

Browse files
updating monitor details
1 parent 7aa99ca commit 1e9b372

File tree

1 file changed

+22
-25
lines changed

1 file changed

+22
-25
lines changed

docs/integrations/hosts-operating-systems/opentelemetry/opentelemetry-collector-monitoring.md

Lines changed: 22 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
2-
id: opentelemetry-collector-monitoring
3-
title: OpenTelemetry Collector Monitoring
4-
sidebar_label: OpenTelemetry Collector Monitoring
5-
description: Learn about the Sumo Logic OpenTelemetry Collector Monitoring app.
2+
id: opentelemetry-collector-insights
3+
title: OpenTelemetry Collector Insights
4+
sidebar_label: OpenTelemetry Collector Insights
5+
description: Learn about the Sumo Logic OpenTelemetry Collector Insights app.
66
---
77

88
import useBaseUrl from '@docusaurus/useBaseUrl';
@@ -11,7 +11,7 @@ import TabItem from '@theme/TabItem';
1111

1212
<img src={useBaseUrl('img/send-data/otel-color.svg')} alt="Thumbnail icon" width="45"/>
1313

14-
The Sumo Logic OpenTelemetry Collector Monitoring app provides comprehensive monitoring and observability for your OpenTelemetry Collector instances. Monitor collector performance, telemetry data flow, resource utilization, and troubleshoot data collection issues with preconfigured dashboards and alerts. Track metrics and logs to ensure your telemetry pipeline is running smoothly and efficiently.
14+
The Sumo Logic OpenTelemetry Collector Insights app provides comprehensive monitoring and observability for your OpenTelemetry Collector instances. Monitor collector performance, telemetry data flow, resource utilization, and troubleshoot data collection issues with preconfigured dashboards and alerts. Track metrics and logs to ensure your telemetry pipeline is running smoothly and efficiently.
1515

1616
This app supports OpenTelemetry Collector version **0.130.1-sumo-0** and later versions.
1717

@@ -20,12 +20,12 @@ We use the OpenTelemetry collector's built-in internal telemetry capabilities to
2020
The diagram below illustrates the components of the OpenTelemetry Collector self-monitoring setup. The collector is configured to export its own telemetry data (metrics and logs) to Sumo Logic through OTLP/HTTP endpoints.
2121

2222
:::info
23-
This app includes [built-in monitors](#opentelemetry-collector-monitoring-alerts). For details on creating custom monitors, refer to [Create monitors for OpenTelemetry Collector Monitoring app](#create-monitors-for-opentelemetry-collector-monitoring-app).
23+
This app includes [built-in monitors](#opentelemetry-collector-insights-alerts). For details on creating custom monitors, refer to [Create monitors for OpenTelemetry Collector Insights app](#create-monitors-for-opentelemetry-collector-insights-app).
2424
:::
2525

26-
## Fields creation in Sumo Logic for OpenTelemetry Collector Monitoring
26+
## Fields creation in Sumo Logic for OpenTelemetry Collector Insights
2727

28-
Following are the [fields](/docs/manage/fields/) which will be created as part of OpenTelemetry Collector Monitoring app installation, if not already present.
28+
Following are the [fields](/docs/manage/fields/) which will be created as part of OpenTelemetry Collector Insights app installation, if not already present.
2929

3030
- **sumo.datasource**. Has fixed value of **otel_collector**.
3131
- **_contentType**. Has fixed value of **OpenTelemetry**.
@@ -314,15 +314,15 @@ sumo.datasource=otel_collector metric=otelcol_exporter_queue_size deployment.env
314314
| avg by exporter, deployment.environment
315315
```
316316

317-
## Viewing OpenTelemetry Collector Monitoring dashboards
317+
## Viewing OpenTelemetry Collector Insights dashboards
318318

319319
All dashboards have a set of filters that you can apply to the entire dashboard. Use these filters to drill down and examine the data to a granular level.
320320
- You can change the time range for a dashboard or panel by selecting a predefined interval from a drop-down list, choosing a recently used time range, or specifying custom dates and times. [Learn more](/docs/dashboards/set-custom-time-ranges/).
321321
- You can use template variables to drill down and examine the data on a granular level. For more information, see [Filtering Dashboards with Template Variables](/docs/dashboards/filter-template-variables/).
322322

323323
### Overview
324324

325-
The **OpenTelemetry Collector - Overview** dashboard provides a high-level view of your OpenTelemetry Collector fleet's health and performance. This is your starting point for monitoring collector instances.
325+
The **OpenTelemetry Collector Insights - Overview** dashboard provides a high-level view of your OpenTelemetry Collector fleet's health and performance. This is your starting point for monitoring collector instances.
326326

327327
Use this dashboard to:
328328
- Monitor the overall health of your collector fleet
@@ -334,7 +334,7 @@ Use this dashboard to:
334334

335335
### Logs
336336

337-
The **OpenTelemetry Collector - Logs** dashboard provides detailed insights into collector log output for root-cause analysis of errors, data dropping events, and restarts.
337+
The **OpenTelemetry Collector Insights - Logs** dashboard provides detailed insights into collector log output for root-cause analysis of errors, data dropping events, and restarts.
338338

339339
Use this dashboard to:
340340
- Analyze error patterns and troubleshoot issues
@@ -346,7 +346,7 @@ Use this dashboard to:
346346

347347
### Pipeline: Receiver Health
348348

349-
The **OpenTelemetry Collector - Pipeline: Receiver Health** dashboard focuses exclusively on the data ingestion stage of the pipeline to monitor data sources and receiver performance.
349+
The **OpenTelemetry Collector Insights - Pipeline: Receiver Health** dashboard focuses exclusively on the data ingestion stage of the pipeline to monitor data sources and receiver performance.
350350

351351
Use this dashboard to:
352352
- Monitor receiver performance and data ingestion rates
@@ -358,7 +358,7 @@ Use this dashboard to:
358358

359359
### Pipeline: Processor Health
360360

361-
The **OpenTelemetry Collector - Pipeline: Processor Health** dashboard is crucial for understanding if any processors (like batch, memory_limiter, or resourcedetection) are dropping data or causing performance issues.
361+
The **OpenTelemetry Collector Insights - Pipeline: Processor Health** dashboard is crucial for understanding if any processors (like batch, memory_limiter, or resourcedetection) are dropping data or causing performance issues.
362362

363363
Use this dashboard to:
364364
- Monitor processor performance and throughput
@@ -370,7 +370,7 @@ Use this dashboard to:
370370

371371
### Pipeline: Exporter Health
372372

373-
The **OpenTelemetry Collector - Pipeline: Exporter Health** dashboard is the most critical dashboard for diagnosing backpressure and data loss at the egress stage of the pipeline.
373+
The **OpenTelemetry Collector Insights - Pipeline: Exporter Health** dashboard is the most critical dashboard for diagnosing backpressure and data loss at the egress stage of the pipeline.
374374

375375
Use this dashboard to:
376376
- Monitor exporter performance and success rates
@@ -382,7 +382,7 @@ Use this dashboard to:
382382

383383
### Resource Utilization
384384

385-
The **OpenTelemetry Collector - Resource Utilization** dashboard provides a deep dive into the collector's own resource consumption to diagnose performance issues and plan for capacity.
385+
The **OpenTelemetry Collector Insights - Resource Utilization** dashboard provides a deep dive into the collector's own resource consumption to diagnose performance issues and plan for capacity.
386386

387387
Use this dashboard to:
388388
- Monitor CPU, memory, and disk usage by collectors
@@ -431,20 +431,17 @@ Configure different log levels for troubleshooting:
431431
- **WARN**: Warning messages about potential issues
432432
- **ERROR**: Error conditions that need attention
433433

434-
## Create monitors for OpenTelemetry Collector Monitoring app
434+
## Create monitors for OpenTelemetry Collector Insights app
435435

436436
import CreateMonitors from '../../../reuse/apps/create-monitors.md';
437437

438438
<CreateMonitors/>
439439

440-
### OpenTelemetry Collector Monitoring alerts
440+
### OpenTelemetry Collector Insights Alerts
441441

442-
| Alert Name | Alert Description and conditions | Alert Condition | Recover Condition |
442+
| Name | Description | Alert Condition | Recover Condition |
443443
|:--|:--|:--|:--|
444-
| `OpenTelemetry Collector - High Memory Usage Alert` | This alert gets triggered when collector memory usage exceeds 80% of available memory. | Count >= 80 | Count < 80 |
445-
| `OpenTelemetry Collector - High CPU Usage Alert` | This alert gets triggered when collector CPU usage exceeds 80% for more than 5 minutes. | Count >= 80 | Count < 80 |
446-
| `OpenTelemetry Collector - Pipeline Data Loss Alert` | This alert gets triggered when data drops are detected in the collector pipeline. | Count >= 1 | Count < 1 |
447-
| `OpenTelemetry Collector - Exporter Failure Alert` | This alert gets triggered when export failures exceed the acceptable threshold. | Count >= 5 | Count < 5 |
448-
| `OpenTelemetry Collector - Collector Down Alert` | This alert gets triggered when a collector instance stops reporting metrics. | Count >= 1 | Count < 1 |
449-
| `OpenTelemetry Collector - High Queue Utilization Alert` | This alert gets triggered when exporter queue utilization exceeds 90%. | Count >= 90 | Count < 90 |
450-
| `OpenTelemetry Collector - Receiver Refusal Rate Alert` | This alert gets triggered when receivers are refusing data at a high rate. | Count >= 10 | Count < 10 |
444+
| `OpenTelemetry Collector Insights - Collector Instance is Down` | This alert fires when a Collector instance stops sending telemetry for more than 10 minutes, indicating it is down or has a connectivity issue. | Missing Data | Data Found |
445+
| `OpenTelemetry Collector Insights - Exporter Queue Nearing Capacity` | This alert fires when an exporter's sending queue is over 90% full. This is a strong leading indicator of back pressure and imminent data loss. | Count > = 90 | Count < 90 |
446+
| `OpenTelemetry Collector Insights - High Memory Usage (RSS)` | This alert fires when a Collector's memory usage (RSS) exceeds 2GB. This could be an early indicator of a memory leak or an under-provisioned host. | Count > 2000000000 | Count < = 2000000000 |
447+
| `OpenTelemetry Collector Insights - High Metadata Cardinality` | This alert fires when the batch processor is handling more than 1000 unique combinations of metadata. This is a known cause of performance degradation, high CPU, and high memory usage. | Count > 1000 | Count < = 1000 |

0 commit comments

Comments
 (0)