You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .clabot
+2-1Lines changed: 2 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -188,7 +188,8 @@
188
188
"ntanwar-sumo",
189
189
"aj-sumo",
190
190
"samiura",
191
-
"naveenrama"
191
+
"naveenrama",
192
+
"fguimond"
192
193
],
193
194
"message": "Thank you for your contribution! As this is an open source project, we require contributors to sign our Contributor License Agreement and do not have yours on file. To proceed with your PR, please [sign your name here](https://forms.gle/YgLddrckeJaCdZYA6) and we will add you to our approved list of contributors.",
Copy file name to clipboardExpand all lines: docs/integrations/hosts-operating-systems/opentelemetry/opentelemetry-collector-insights.md
+52-46Lines changed: 52 additions & 46 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ import TabItem from '@theme/TabItem';
13
13
14
14
The Sumo Logic OpenTelemetry Collector Insights app provides comprehensive monitoring and observability for your OpenTelemetry Collector instances. Monitor collector performance, telemetry data flow, resource utilization, and troubleshoot data collection issues with preconfigured dashboards and alerts. Track metrics and logs to ensure your telemetry pipeline is running smoothly and efficiently.
15
15
16
-
This app supports OpenTelemetry Collector version **0.130.1-sumo-0** and later versions.
16
+
This app supports OpenTelemetry Collector version `0.130.1-sumo-0` and later versions.
17
17
18
18
We use the OpenTelemetry collector's built-in internal telemetry capabilities to collect metrics and logs about the collector itself. By default, the Collector exposes its own telemetry through internal metrics (via Prometheus interface on port 8888) and logs (emitted to stderr). The collector can also be configured to export its own telemetry data (metrics and logs) to Sumo Logic through OTLP/HTTP endpoints.
19
19
@@ -25,13 +25,13 @@ This app includes [built-in monitors](#opentelemetry-collector-insights-alerts).
25
25
26
26
Following are the [fields](/docs/manage/fields/) which will be created as part of OpenTelemetry Collector Insights app installation, if not already present.
27
27
28
-
-**sumo.datasource**. Has fixed value of **otel_collector**.
29
-
-**_contentType**. Has fixed value of **OpenTelemetry**.
30
-
-**deployment.environment**. User configured. Enter a name to identify your deployment environment.
28
+
-`sumo.datasource`. Has fixed value of `otel_collector`.
29
+
-`_contentType`. Has fixed value of `OpenTelemetry`.
30
+
-`deployment.environment`. User configured. Enter a name to identify your deployment environment.
31
31
32
32
## Prerequisites
33
33
34
-
### For OTLP Endpoint Configuration
34
+
### For OTLP endpoint configuration
35
35
36
36
Before configuring the OTEL Collector integration, ensure you have the following prerequisites in place:
37
37
@@ -85,10 +85,10 @@ In this step, you will configure the OpenTelemetry Collector's built-in telemetr
85
85
The collector's service configuration needs to be updated to enable telemetry export. Below is the required configuration that should be added to your collector's service section:
86
86
87
87
**Required Inputs:**
88
-
-**OTLP Endpoint**: Your Sumo Logic OTLP endpoint URL
88
+
-**OTLP Endpoint**: Your Sumo Logic OTLP endpoint URL.
89
89
90
90
**Configuration Parameters:**
91
-
-**Endpoint Format**: The base endpoint automatically creates:
91
+
-**Endpoint Format**. The base endpoint automatically creates:
92
92
- Logs endpoint: `${OTLP_ENDPOINT}/v1/logs`
93
93
- Metrics endpoint: `${OTLP_ENDPOINT}/v1/metrics`
94
94
@@ -209,11 +209,11 @@ import LogsOutro from '../../../reuse/apps/opentelemetry/send-logs-outro.md';
209
209
### Validation
210
210
211
211
After installation, verify that:
212
-
1. The OTEL Collector service is running
213
-
2. The configured base endpoint is reachable
214
-
3. Data is being successfully sent to both the logs (`/v1/logs`) and metrics (`/v1/metrics`) endpoints
215
-
4. Resource attributes are properly applied to the telemetry data
216
-
5. Internal metrics are accessible at `http://localhost:8888/metrics`
212
+
1. The OTEL Collector service is running.
213
+
2. The configured base endpoint is reachable.
214
+
3. Data is being successfully sent to both the logs (`/v1/logs`) and metrics (`/v1/metrics`) endpoints.
215
+
4. Resource attributes are properly applied to the telemetry data.
216
+
5. Internal metrics are accessible at `http://localhost:8888/metrics`.
217
217
218
218
## Sample log messages
219
219
@@ -319,10 +319,10 @@ All dashboards have a set of filters that you can apply to the entire dashboard.
319
319
The **OpenTelemetry Collector Insights - Overview** dashboard provides a high-level view of your OpenTelemetry Collector fleet's health and performance. This is your starting point for monitoring collector instances.
320
320
321
321
Use this dashboard to:
322
-
- Monitor the overall health of your collector fleet
323
-
- Identify performance bottlenecks and resource constraints
324
-
- Track data flow and processing rates across collectors
325
-
- Quickly spot collectors experiencing issues
322
+
- Monitor the overall health of your collector fleet.
323
+
- Identify performance bottlenecks and resource constraints.
324
+
- Track data flow and processing rates across collectors.
The **OpenTelemetry Collector Insights - Logs** dashboard provides detailed insights into collector log output for root-cause analysis of errors, data dropping events, and restarts.
332
332
333
333
Use this dashboard to:
334
-
- Analyze error patterns and troubleshoot issues
335
-
- Monitor collector startup and shutdown events
336
-
- Identify data loss or processing problems
337
-
- Track log severity trends across your collector fleet
334
+
- Analyze error patterns and troubleshoot issues.
335
+
- Monitor collector startup and shutdown events.
336
+
- Identify data loss or processing problems.
337
+
- Track log severity trends across your collector fleet.
The **OpenTelemetry Collector Insights - Pipeline: Receiver Health** dashboard focuses exclusively on the data ingestion stage of the pipeline to monitor data sources and receiver performance.
344
344
345
345
Use this dashboard to:
346
-
- Monitor receiver performance and data ingestion rates
347
-
- Identify issues with data sources and input connections
348
-
- Track receiver-specific errors and failures
349
-
- Analyze accepted vs refused data points
346
+
- Monitor receiver performance and data ingestion rates.
347
+
- Identify issues with data sources and input connections.
The **OpenTelemetry Collector Insights - Pipeline: Processor Health** dashboard is crucial for understanding if any processors (like batch, memory_limiter, or resourcedetection) are dropping data or causing performance issues.
356
356
357
357
Use this dashboard to:
358
-
- Monitor processor performance and throughput
359
-
- Identify data drops or processing bottlenecks
360
-
- Track processor-specific configurations and health
361
-
- Analyze batch processing efficiency and triggers
358
+
- Monitor processor performance and throughput.
359
+
- Identify data drops or processing bottlenecks.
360
+
- Track processor-specific configurations and health.
361
+
- Analyze batch processing efficiency and triggers.
The **OpenTelemetry Collector Insights - Pipeline: Exporter Health** dashboard is the most critical dashboard for diagnosing backpressure and data loss at the egress stage of the pipeline.
368
368
369
369
Use this dashboard to:
370
-
- Monitor exporter performance and success rates
371
-
- Identify backpressure issues and export failures
372
-
- Track data delivery to downstream systems
373
-
- Analyze queue utilization and capacity
370
+
- Monitor exporter performance and success rates.
371
+
- Identify backpressure issues and export failures.
The **OpenTelemetry Collector Insights - Resource Utilization** dashboard provides a deep dive into the collector's own resource consumption to diagnose performance issues and plan for capacity.
380
380
381
381
Use this dashboard to:
382
-
- Monitor CPU, memory, and disk usage by collectors
383
-
- Plan capacity and resource allocation
384
-
- Identify resource constraints and optimization opportunities
385
-
- Track heap allocation and garbage collection patterns
382
+
- Monitor CPU, memory, and disk usage by collectors.
383
+
- Plan capacity and resource allocation.
384
+
- Identify resource constraints and optimization opportunities.
385
+
- Track heap allocation and garbage collection patterns.
**Collector connection failure**: If your collector fails to connect to Sumo Logic, you may need to configure proxy settings. Check the collector's logs for connection errors:
393
+
##### Collector connection failure
394
+
395
+
If your collector fails to connect to Sumo Logic, you may need to configure proxy settings. Check the collector's logs for connection errors:
# Look for errors like "Unable to get a heartbeat"
400
402
```
401
403
402
-
**High queue utilization**: Monitor the `otelcol_exporter_queue_size` and `otelcol_exporter_queue_capacity` metrics. If the queue is consistently full, you may need to:
404
+
##### High queue utilization
405
+
406
+
Monitor the `otelcol_exporter_queue_size` and `otelcol_exporter_queue_capacity` metrics. If the queue is consistently full, you may need to:
403
407
- Reduce data ingestion rate
404
408
- Increase queue capacity
405
409
- Scale horizontally with more collectors
406
410
407
-
**Data dropping**: Watch for logs containing "Dropping data because sending_queue is full" and monitor failed enqueue metrics:
411
+
##### Data dropping
412
+
413
+
Watch for logs containing "Dropping data because sending_queue is full" and monitor failed enqueue metrics:
408
414
- `otelcol_exporter_enqueue_failed_spans`
409
415
- `otelcol_exporter_enqueue_failed_metric_points`
410
416
- `otelcol_exporter_enqueue_failed_log_records`
411
417
412
-
### Accessing Collector Metrics Directly
418
+
### Accessing collector metrics directly
413
419
414
420
By default, the collector's internal metrics are available in Prometheus format at `http://localhost:8888/metrics`. You can access them using:
415
421
416
422
```bash
417
423
curl http://localhost:8888/metrics
418
424
```
419
425
420
-
### Log Levels and Configuration
426
+
### Log levels and configuration
421
427
422
428
Configure different log levels for troubleshooting:
423
-
- **DEBUG**: Most verbose, includes detailed trace information
424
-
- **INFO**: Standard operational information (default)
425
-
- **WARN**: Warning messages about potential issues
426
-
- **ERROR**: Error conditions that need attention
429
+
- **DEBUG**. Most verbose, includes detailed trace information
430
+
- **INFO**. Standard operational information (default)
431
+
- **WARN**. Warning messages about potential issues
432
+
- **ERROR**. Error conditions that need attention
427
433
428
434
## Create monitors for OpenTelemetry Collector Insights app
Amazon Athena is a cloud-based service that enables you to run SQL queries on data stored in Amazon S3 without the need to set up any infrastructure. It is a serverless, pay-per-query service that makes it easy to analyze large amounts of data.
13
13
@@ -34,9 +34,11 @@ To [get access key and secret access key](https://docs.aws.amazon.com/athena/lat
34
34
35
35
import IntegrationsAuth from '../../../../reuse/integrations-authentication.md';
36
36
import IntegrationsAuthAWS from '../../../../reuse/integrations-authentication-aws.md';
37
+
import IAMConfiguration from '../../../../reuse/automation-service/aws/iam-configuration.md';
37
38
import AWSRegions from '../../../../reuse/automation-service/aws/region.md';
38
39
import AWSAccesskey from '../../../../reuse/automation-service/aws/access-key.md';
39
40
import AWSSecret from '../../../../reuse/automation-service/aws/secret.md';
41
+
import AWSIAMRole from '../../../../reuse/automation-service/aws/iam-role.md';
40
42
import IntegrationCertificate from '../../../../reuse/automation-service/integration-certificate.md';
41
43
import IntegrationEngine from '../../../../reuse/automation-service/integration-engine.md';
42
44
import IntegrationLabel from '../../../../reuse/automation-service/integration-label.md';
@@ -49,16 +51,41 @@ import IntegrationTimeout from '../../../../reuse/automation-service/integration
For information about Amazon Athena, see [Athena documentation](https://docs.aws.amazon.com/athena/).
63
+
59
64
<IntegrationsAuthAWS/>
60
65
61
-
For information about Amazon Athena, see [Athena documentation](https://docs.aws.amazon.com/athena/).
66
+
### AWS IAM role-based access
67
+
68
+
<IAMConfiguration/>
69
+
70
+
## Required Permissions
71
+
```
72
+
athena:StartQueryExecution
73
+
athena:GetQueryExecution
74
+
athena:GetQueryResults
75
+
athena:StopQueryExecution
76
+
athena:ListDatabases
77
+
athena:ListWorkGroups
78
+
athena:ListTableMetadata
79
+
glue:GetDatabase
80
+
glue:GetDatabases
81
+
glue:GetTable
82
+
glue:GetTables
83
+
glue:GetTableVersion
84
+
glue:GetTableVersions
85
+
s3:GetObject
86
+
s3:PutObject
87
+
s3:ListBucket
88
+
```
62
89
63
90
## External Libraries
64
91
@@ -68,3 +95,4 @@ For information about Amazon Athena, see [Athena documentation](https://docs.aws
68
95
69
96
* February 22, 2023 (v1.0) - First upload
70
97
* June 15, 2023 (v1.1) - Updated the integration with Environmental Variables
98
+
* July 29, 2025 (v1.2) - Added support for IAM role authentication - Users can now authenticate using an AWS IAM Role in addition to access key–based authentication.
For information about AWS CloudFront, see [CloudFront documentation](https://docs.aws.amazon.com/cloudfront/).
55
+
51
56
<IntegrationsAuthAWS/>
52
57
53
-
For information about AWS CloudFront, see [CloudFront documentation](https://docs.aws.amazon.com/cloudfront/).
58
+
### AWS IAM role-based access
59
+
60
+
<IAMConfiguration/>
61
+
62
+
## Required Permissions
63
+
```
64
+
cloudfront:ListCloudFrontOriginAccessIdentities
65
+
cloudfront:GetCloudFrontOriginAccessIdentity
66
+
```
54
67
55
68
## Change Log
56
69
57
70
* November 10, 2022 - First upload
58
71
* April 14, 2023 (v1.1)
59
72
+ Updated integration: (Updated the integration Fields with Environmental Variables)
60
73
* June 15, 2023 (v1.3) - Updated the integration with Environmental Variables
74
+
* July 29, 2025 (v1.4) - Added support for IAM role authentication - Users can now authenticate using an AWS IAM Role in addition to access key–based authentication.
0 commit comments