You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Sumo Logic OpenTelemetry Collector Monitoring app provides comprehensive monitoring and observability for your OpenTelemetry Collector instances. Monitor collector performance, telemetry data flow, resource utilization, and troubleshoot data collection issues with preconfigured dashboards and alerts. Track metrics and logs to ensure your telemetry pipeline is running smoothly and efficiently.
14
+
The Sumo Logic OpenTelemetry Collector Insights app provides comprehensive monitoring and observability for your OpenTelemetry Collector instances. Monitor collector performance, telemetry data flow, resource utilization, and troubleshoot data collection issues with preconfigured dashboards and alerts. Track metrics and logs to ensure your telemetry pipeline is running smoothly and efficiently.
15
15
16
16
This app supports OpenTelemetry Collector version **0.130.1-sumo-0** and later versions.
17
17
@@ -20,12 +20,12 @@ We use the OpenTelemetry collector's built-in internal telemetry capabilities to
20
20
The diagram below illustrates the components of the OpenTelemetry Collector self-monitoring setup. The collector is configured to export its own telemetry data (metrics and logs) to Sumo Logic through OTLP/HTTP endpoints.
21
21
22
22
:::info
23
-
This app includes [built-in monitors](#opentelemetry-collector-monitoring-alerts). For details on creating custom monitors, refer to [Create monitors for OpenTelemetry Collector Monitoring app](#create-monitors-for-opentelemetry-collector-monitoring-app).
23
+
This app includes [built-in monitors](#opentelemetry-collector-insights-alerts). For details on creating custom monitors, refer to [Create monitors for OpenTelemetry Collector Insights app](#create-monitors-for-opentelemetry-collector-insights-app).
24
24
:::
25
25
26
-
## Fields creation in Sumo Logic for OpenTelemetry Collector Monitoring
26
+
## Fields creation in Sumo Logic for OpenTelemetry Collector Insights
27
27
28
-
Following are the [fields](/docs/manage/fields/) which will be created as part of OpenTelemetry Collector Monitoring app installation, if not already present.
28
+
Following are the [fields](/docs/manage/fields/) which will be created as part of OpenTelemetry Collector Insights app installation, if not already present.
29
29
30
30
-**sumo.datasource**. Has fixed value of **otel_collector**.
31
31
-**_contentType**. Has fixed value of **OpenTelemetry**.
All dashboards have a set of filters that you can apply to the entire dashboard. Use these filters to drill down and examine the data to a granular level.
320
320
- You can change the time range for a dashboard or panel by selecting a predefined interval from a drop-down list, choosing a recently used time range, or specifying custom dates and times. [Learn more](/docs/dashboards/set-custom-time-ranges/).
321
321
- You can use template variables to drill down and examine the data on a granular level. For more information, see [Filtering Dashboards with Template Variables](/docs/dashboards/filter-template-variables/).
322
322
323
323
### Overview
324
324
325
-
The **OpenTelemetry Collector - Overview** dashboard provides a high-level view of your OpenTelemetry Collector fleet's health and performance. This is your starting point for monitoring collector instances.
325
+
The **OpenTelemetry Collector Insights - Overview** dashboard provides a high-level view of your OpenTelemetry Collector fleet's health and performance. This is your starting point for monitoring collector instances.
326
326
327
327
Use this dashboard to:
328
328
- Monitor the overall health of your collector fleet
@@ -334,7 +334,7 @@ Use this dashboard to:
334
334
335
335
### Logs
336
336
337
-
The **OpenTelemetry Collector - Logs** dashboard provides detailed insights into collector log output for root-cause analysis of errors, data dropping events, and restarts.
337
+
The **OpenTelemetry Collector Insights - Logs** dashboard provides detailed insights into collector log output for root-cause analysis of errors, data dropping events, and restarts.
338
338
339
339
Use this dashboard to:
340
340
- Analyze error patterns and troubleshoot issues
@@ -346,7 +346,7 @@ Use this dashboard to:
346
346
347
347
### Pipeline: Receiver Health
348
348
349
-
The **OpenTelemetry Collector - Pipeline: Receiver Health** dashboard focuses exclusively on the data ingestion stage of the pipeline to monitor data sources and receiver performance.
349
+
The **OpenTelemetry Collector Insights - Pipeline: Receiver Health** dashboard focuses exclusively on the data ingestion stage of the pipeline to monitor data sources and receiver performance.
350
350
351
351
Use this dashboard to:
352
352
- Monitor receiver performance and data ingestion rates
@@ -358,7 +358,7 @@ Use this dashboard to:
358
358
359
359
### Pipeline: Processor Health
360
360
361
-
The **OpenTelemetry Collector - Pipeline: Processor Health** dashboard is crucial for understanding if any processors (like batch, memory_limiter, or resourcedetection) are dropping data or causing performance issues.
361
+
The **OpenTelemetry Collector Insights - Pipeline: Processor Health** dashboard is crucial for understanding if any processors (like batch, memory_limiter, or resourcedetection) are dropping data or causing performance issues.
362
362
363
363
Use this dashboard to:
364
364
- Monitor processor performance and throughput
@@ -370,7 +370,7 @@ Use this dashboard to:
370
370
371
371
### Pipeline: Exporter Health
372
372
373
-
The **OpenTelemetry Collector - Pipeline: Exporter Health** dashboard is the most critical dashboard for diagnosing backpressure and data loss at the egress stage of the pipeline.
373
+
The **OpenTelemetry Collector Insights - Pipeline: Exporter Health** dashboard is the most critical dashboard for diagnosing backpressure and data loss at the egress stage of the pipeline.
374
374
375
375
Use this dashboard to:
376
376
- Monitor exporter performance and success rates
@@ -382,7 +382,7 @@ Use this dashboard to:
382
382
383
383
### Resource Utilization
384
384
385
-
The **OpenTelemetry Collector - Resource Utilization** dashboard provides a deep dive into the collector's own resource consumption to diagnose performance issues and plan for capacity.
385
+
The **OpenTelemetry Collector Insights - Resource Utilization** dashboard provides a deep dive into the collector's own resource consumption to diagnose performance issues and plan for capacity.
386
386
387
387
Use this dashboard to:
388
388
- Monitor CPU, memory, and disk usage by collectors
@@ -431,20 +431,17 @@ Configure different log levels for troubleshooting:
431
431
- **WARN**: Warning messages about potential issues
432
432
- **ERROR**: Error conditions that need attention
433
433
434
-
## Create monitors for OpenTelemetry Collector Monitoring app
434
+
## Create monitors for OpenTelemetry Collector Insights app
435
435
436
436
import CreateMonitors from '../../../reuse/apps/create-monitors.md';
437
437
438
438
<CreateMonitors/>
439
439
440
-
### OpenTelemetry Collector Monitoring alerts
440
+
### OpenTelemetry Collector Insights Alerts
441
441
442
-
| Alert Name | Alert Description and conditions | Alert Condition | Recover Condition |
| `OpenTelemetry Collector - High Memory Usage Alert` | This alert gets triggered when collector memory usage exceeds 80% of available memory. | Count >= 80 | Count < 80 |
445
-
| `OpenTelemetry Collector - High CPU Usage Alert` | This alert gets triggered when collector CPU usage exceeds 80% for more than 5 minutes. | Count >= 80 | Count < 80 |
446
-
| `OpenTelemetry Collector - Pipeline Data Loss Alert` | This alert gets triggered when data drops are detected in the collector pipeline. | Count >= 1 | Count < 1 |
447
-
| `OpenTelemetry Collector - Exporter Failure Alert` | This alert gets triggered when export failures exceed the acceptable threshold. | Count >= 5 | Count < 5 |
448
-
| `OpenTelemetry Collector - Collector Down Alert` | This alert gets triggered when a collector instance stops reporting metrics. | Count >= 1 | Count < 1 |
449
-
| `OpenTelemetry Collector - High Queue Utilization Alert` | This alert gets triggered when exporter queue utilization exceeds 90%. | Count >= 90 | Count < 90 |
450
-
| `OpenTelemetry Collector - Receiver Refusal Rate Alert` | This alert gets triggered when receivers are refusing data at a high rate. | Count >= 10 | Count < 10 |
444
+
| `OpenTelemetry Collector Insights - Collector Instance is Down` | This alert fires when a Collector instance stops sending telemetry for more than 10 minutes, indicating it is down or has a connectivity issue. | Missing Data | Data Found |
445
+
| `OpenTelemetry Collector Insights - Exporter Queue Nearing Capacity` | This alert fires when an exporter's sending queue is over 90% full. This is a strong leading indicator of back pressure and imminent data loss. | Count > = 90 | Count < 90 |
446
+
| `OpenTelemetry Collector Insights - High Memory Usage (RSS)` | This alert fires when a Collector's memory usage (RSS) exceeds 2GB. This could be an early indicator of a memory leak or an under-provisioned host. | Count > 2000000000 | Count < = 2000000000 |
447
+
| `OpenTelemetry Collector Insights - High Metadata Cardinality` | This alert fires when the batch processor is handling more than 1000 unique combinations of metadata. This is a known cause of performance degradation, high CPU, and high memory usage. | Count > 1000 | Count < = 1000 |
0 commit comments