You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/integrations/hosts-operating-systems/opentelemetry/opentelemetry-collector-insights.md
+59-53Lines changed: 59 additions & 53 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ import TabItem from '@theme/TabItem';
13
13
14
14
The Sumo Logic OpenTelemetry Collector Insights app provides comprehensive monitoring and observability for your OpenTelemetry Collector instances. Monitor collector performance, telemetry data flow, resource utilization, and troubleshoot data collection issues with preconfigured dashboards and alerts. Track metrics and logs to ensure your telemetry pipeline is running smoothly and efficiently.
15
15
16
-
This app supports OpenTelemetry Collector version **0.130.1-sumo-0** and later versions.
16
+
This app supports OpenTelemetry Collector version `0.130.1-sumo-0` and later versions.
17
17
18
18
We use the OpenTelemetry collector's built-in internal telemetry capabilities to collect metrics and logs about the collector itself. By default, the Collector exposes its own telemetry through internal metrics (via Prometheus interface on port 8888) and logs (emitted to stderr).
19
19
@@ -27,24 +27,24 @@ This app includes [built-in monitors](#opentelemetry-collector-insights-alerts).
27
27
28
28
Following are the [fields](/docs/manage/fields/) which will be created as part of OpenTelemetry Collector Insights app installation, if not already present.
29
29
30
-
-**sumo.datasource**. Has fixed value of **otel_collector**.
31
-
-**_contentType**. Has fixed value of **OpenTelemetry**.
32
-
-**deployment.environment**. User configured. Enter a name to identify your deployment environment.
30
+
-`sumo.datasource`. Has fixed value of `otel_collector`.
31
+
-`_contentType`. Has fixed value of `OpenTelemetry`.
32
+
-`deployment.environment`. User configured. Enter a name to identify your deployment environment.
33
33
34
34
## Prerequisites
35
35
36
-
### For OTLP Endpoint Configuration
36
+
### For OTLP endpoint configuration
37
37
38
38
Before configuring the OTEL Collector integration, ensure you have the following prerequisites in place:
39
39
40
-
1.**OTLP Endpoint**: You need a valid base OTLP endpoint URL. The system will automatically append `/v1/logs` for logs collection and `/v1/metrics` for metrics collection. The endpoint should be accessible from your OTEL Collector instance.
40
+
1.**OTLP Endpoint**. You need a valid base OTLP endpoint URL. The system will automatically append `/v1/logs` for logs collection and `/v1/metrics` for metrics collection. The endpoint should be accessible from your OTEL Collector instance.
41
41
42
-
2.**Network Access**: Ensure that your OTEL Collector has network access to the configured OTLP endpoint. This includes:
42
+
2.**Network Access**. Ensure that your OTEL Collector has network access to the configured OTLP endpoint. This includes:
43
43
- Outbound HTTPS connectivity on port 443
44
44
- Proper firewall configurations to allow traffic to the endpoint
45
45
- DNS resolution for the endpoint hostname
46
46
47
-
3.**Authentication**: If your OTLP endpoint requires authentication, ensure you have the proper credentials or tokens configured.
47
+
3.**Authentication**. If your OTLP endpoint requires authentication, ensure you have the proper credentials or tokens configured.
48
48
49
49
### For metrics collection
50
50
@@ -90,16 +90,16 @@ In this step, you will configure the OpenTelemetry Collector's built-in telemetr
90
90
The collector's service configuration needs to be updated to enable telemetry export. Below is the required configuration that should be added to your collector's service section:
91
91
92
92
**Required Inputs:**
93
-
-**OTLP Endpoint**: Your Sumo Logic OTLP endpoint base URL
94
-
-**Deployment Environment**: Enter a name to identify your deployment environment
93
+
-**OTLP Endpoint**. Your Sumo Logic OTLP endpoint base URL
94
+
-**Deployment Environment**. Enter a name to identify your deployment environment
95
95
96
96
**Configuration Parameters:**
97
-
-**Endpoint Format**: The base endpoint automatically creates:
97
+
-**Endpoint Format**. The base endpoint automatically creates:
98
98
- Logs endpoint: `${OTLP_ENDPOINT}/v1/logs`
99
99
- Metrics endpoint: `${OTLP_ENDPOINT}/v1/metrics`
100
-
-**Protocol**: HTTP/protobuf for OTLP communication
101
-
-**Metrics level**: Set to **detailed** for comprehensive monitoring
102
-
-**Logs level**: Set to **debug** for detailed troubleshooting information
100
+
-**Protocol**. HTTP/protobuf for OTLP communication
101
+
-**Metrics level**. Set to **detailed** for comprehensive monitoring
102
+
-**Logs level**. Set to **debug** for detailed troubleshooting information
103
103
104
104
```yaml
105
105
service:
@@ -215,11 +215,11 @@ import LogsOutro from '../../../reuse/apps/opentelemetry/send-logs-outro.md';
215
215
### Validation
216
216
217
217
After installation, verify that:
218
-
1. The OTEL Collector service is running
219
-
2. The configured base endpoint is reachable
220
-
3. Data is being successfully sent to both the logs (`/v1/logs`) and metrics (`/v1/metrics`) endpoints
221
-
4. Resource attributes are properly applied to the telemetry data
222
-
5. Internal metrics are accessible at `http://localhost:8888/metrics`
218
+
1. The OTEL Collector service is running.
219
+
2. The configured base endpoint is reachable.
220
+
3. Data is being successfully sent to both the logs (`/v1/logs`) and metrics (`/v1/metrics`) endpoints.
221
+
4. Resource attributes are properly applied to the telemetry data.
222
+
5. Internal metrics are accessible at `http://localhost:8888/metrics`.
223
223
224
224
## Sample log messages
225
225
@@ -325,10 +325,10 @@ All dashboards have a set of filters that you can apply to the entire dashboard.
325
325
The **OpenTelemetry Collector Insights - Overview** dashboard provides a high-level view of your OpenTelemetry Collector fleet's health and performance. This is your starting point for monitoring collector instances.
326
326
327
327
Use this dashboard to:
328
-
- Monitor the overall health of your collector fleet
329
-
- Identify performance bottlenecks and resource constraints
330
-
- Track data flow and processing rates across collectors
331
-
- Quickly spot collectors experiencing issues
328
+
- Monitor the overall health of your collector fleet.
329
+
- Identify performance bottlenecks and resource constraints.
330
+
- Track data flow and processing rates across collectors.
The **OpenTelemetry Collector Insights - Logs** dashboard provides detailed insights into collector log output for root-cause analysis of errors, data dropping events, and restarts.
338
338
339
339
Use this dashboard to:
340
-
- Analyze error patterns and troubleshoot issues
341
-
- Monitor collector startup and shutdown events
342
-
- Identify data loss or processing problems
343
-
- Track log severity trends across your collector fleet
340
+
- Analyze error patterns and troubleshoot issues.
341
+
- Monitor collector startup and shutdown events.
342
+
- Identify data loss or processing problems.
343
+
- Track log severity trends across your collector fleet.
The **OpenTelemetry Collector Insights - Pipeline: Receiver Health** dashboard focuses exclusively on the data ingestion stage of the pipeline to monitor data sources and receiver performance.
350
350
351
351
Use this dashboard to:
352
-
- Monitor receiver performance and data ingestion rates
353
-
- Identify issues with data sources and input connections
354
-
- Track receiver-specific errors and failures
355
-
- Analyze accepted vs refused data points
352
+
- Monitor receiver performance and data ingestion rates.
353
+
- Identify issues with data sources and input connections.
The **OpenTelemetry Collector Insights - Pipeline: Processor Health** dashboard is crucial for understanding if any processors (like batch, memory_limiter, or resourcedetection) are dropping data or causing performance issues.
362
362
363
363
Use this dashboard to:
364
-
- Monitor processor performance and throughput
365
-
- Identify data drops or processing bottlenecks
366
-
- Track processor-specific configurations and health
367
-
- Analyze batch processing efficiency and triggers
364
+
- Monitor processor performance and throughput.
365
+
- Identify data drops or processing bottlenecks.
366
+
- Track processor-specific configurations and health.
367
+
- Analyze batch processing efficiency and triggers.
The **OpenTelemetry Collector Insights - Pipeline: Exporter Health** dashboard is the most critical dashboard for diagnosing backpressure and data loss at the egress stage of the pipeline.
374
374
375
375
Use this dashboard to:
376
-
- Monitor exporter performance and success rates
377
-
- Identify backpressure issues and export failures
378
-
- Track data delivery to downstream systems
379
-
- Analyze queue utilization and capacity
376
+
- Monitor exporter performance and success rates.
377
+
- Identify backpressure issues and export failures.
The **OpenTelemetry Collector Insights - Resource Utilization** dashboard provides a deep dive into the collector's own resource consumption to diagnose performance issues and plan for capacity.
386
386
387
387
Use this dashboard to:
388
-
- Monitor CPU, memory, and disk usage by collectors
389
-
- Plan capacity and resource allocation
390
-
- Identify resource constraints and optimization opportunities
391
-
- Track heap allocation and garbage collection patterns
388
+
- Monitor CPU, memory, and disk usage by collectors.
389
+
- Plan capacity and resource allocation.
390
+
- Identify resource constraints and optimization opportunities.
391
+
- Track heap allocation and garbage collection patterns.
**Collector connection failure**: If your collector fails to connect to Sumo Logic, you may need to configure proxy settings. Check the collector's logs for connection errors:
399
+
##### Collector connection failure
400
+
401
+
If your collector fails to connect to Sumo Logic, you may need to configure proxy settings. Check the collector's logs for connection errors:
# Look for errors like "Unable to get a heartbeat"
406
408
```
407
409
408
-
**High queue utilization**: Monitor the `otelcol_exporter_queue_size` and `otelcol_exporter_queue_capacity` metrics. If the queue is consistently full, you may need to:
410
+
##### High queue utilization
411
+
412
+
Monitor the `otelcol_exporter_queue_size` and `otelcol_exporter_queue_capacity` metrics. If the queue is consistently full, you may need to:
409
413
- Reduce data ingestion rate
410
414
- Increase queue capacity
411
415
- Scale horizontally with more collectors
412
416
413
-
**Data dropping**: Watch for logs containing "Dropping data because sending_queue is full" and monitor failed enqueue metrics:
417
+
##### Data dropping
418
+
419
+
Watch for logs containing "Dropping data because sending_queue is full" and monitor failed enqueue metrics:
414
420
- `otelcol_exporter_enqueue_failed_spans`
415
421
- `otelcol_exporter_enqueue_failed_metric_points`
416
422
- `otelcol_exporter_enqueue_failed_log_records`
417
423
418
-
### Accessing Collector Metrics Directly
424
+
### Accessing collector metrics directly
419
425
420
426
By default, the collector's internal metrics are available in Prometheus format at `http://localhost:8888/metrics`. You can access them using:
421
427
422
428
```bash
423
429
curl http://localhost:8888/metrics
424
430
```
425
431
426
-
### Log Levels and Configuration
432
+
### Log levels and configuration
427
433
428
434
Configure different log levels for troubleshooting:
429
-
- **DEBUG**: Most verbose, includes detailed trace information
430
-
- **INFO**: Standard operational information (default)
431
-
- **WARN**: Warning messages about potential issues
432
-
- **ERROR**: Error conditions that need attention
435
+
- **DEBUG**. Most verbose, includes detailed trace information
436
+
- **INFO**. Standard operational information (default)
437
+
- **WARN**. Warning messages about potential issues
438
+
- **ERROR**. Error conditions that need attention
433
439
434
440
## Create monitors for OpenTelemetry Collector Insights app
0 commit comments