Skip to content

Commit d3bdbc3

Browse files
committed
Updates from review
1 parent e96d736 commit d3bdbc3

File tree

1 file changed

+59
-53
lines changed

1 file changed

+59
-53
lines changed

docs/integrations/hosts-operating-systems/opentelemetry/opentelemetry-collector-insights.md

Lines changed: 59 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ import TabItem from '@theme/TabItem';
1313

1414
The Sumo Logic OpenTelemetry Collector Insights app provides comprehensive monitoring and observability for your OpenTelemetry Collector instances. Monitor collector performance, telemetry data flow, resource utilization, and troubleshoot data collection issues with preconfigured dashboards and alerts. Track metrics and logs to ensure your telemetry pipeline is running smoothly and efficiently.
1515

16-
This app supports OpenTelemetry Collector version **0.130.1-sumo-0** and later versions.
16+
This app supports OpenTelemetry Collector version `0.130.1-sumo-0` and later versions.
1717

1818
We use the OpenTelemetry collector's built-in internal telemetry capabilities to collect metrics and logs about the collector itself. By default, the Collector exposes its own telemetry through internal metrics (via Prometheus interface on port 8888) and logs (emitted to stderr).
1919

@@ -27,24 +27,24 @@ This app includes [built-in monitors](#opentelemetry-collector-insights-alerts).
2727

2828
Following are the [fields](/docs/manage/fields/) which will be created as part of OpenTelemetry Collector Insights app installation, if not already present.
2929

30-
- **sumo.datasource**. Has fixed value of **otel_collector**.
31-
- **_contentType**. Has fixed value of **OpenTelemetry**.
32-
- **deployment.environment**. User configured. Enter a name to identify your deployment environment.
30+
- `sumo.datasource`. Has fixed value of `otel_collector`.
31+
- `_contentType`. Has fixed value of `OpenTelemetry`.
32+
- `deployment.environment`. User configured. Enter a name to identify your deployment environment.
3333

3434
## Prerequisites
3535

36-
### For OTLP Endpoint Configuration
36+
### For OTLP endpoint configuration
3737

3838
Before configuring the OTEL Collector integration, ensure you have the following prerequisites in place:
3939

40-
1. **OTLP Endpoint**: You need a valid base OTLP endpoint URL. The system will automatically append `/v1/logs` for logs collection and `/v1/metrics` for metrics collection. The endpoint should be accessible from your OTEL Collector instance.
40+
1. **OTLP Endpoint**. You need a valid base OTLP endpoint URL. The system will automatically append `/v1/logs` for logs collection and `/v1/metrics` for metrics collection. The endpoint should be accessible from your OTEL Collector instance.
4141

42-
2. **Network Access**: Ensure that your OTEL Collector has network access to the configured OTLP endpoint. This includes:
42+
2. **Network Access**. Ensure that your OTEL Collector has network access to the configured OTLP endpoint. This includes:
4343
- Outbound HTTPS connectivity on port 443
4444
- Proper firewall configurations to allow traffic to the endpoint
4545
- DNS resolution for the endpoint hostname
4646

47-
3. **Authentication**: If your OTLP endpoint requires authentication, ensure you have the proper credentials or tokens configured.
47+
3. **Authentication**. If your OTLP endpoint requires authentication, ensure you have the proper credentials or tokens configured.
4848

4949
### For metrics collection
5050

@@ -90,16 +90,16 @@ In this step, you will configure the OpenTelemetry Collector's built-in telemetr
9090
The collector's service configuration needs to be updated to enable telemetry export. Below is the required configuration that should be added to your collector's service section:
9191

9292
**Required Inputs:**
93-
- **OTLP Endpoint**: Your Sumo Logic OTLP endpoint base URL
94-
- **Deployment Environment**: Enter a name to identify your deployment environment
93+
- **OTLP Endpoint**. Your Sumo Logic OTLP endpoint base URL
94+
- **Deployment Environment**. Enter a name to identify your deployment environment
9595

9696
**Configuration Parameters:**
97-
- **Endpoint Format**: The base endpoint automatically creates:
97+
- **Endpoint Format**. The base endpoint automatically creates:
9898
- Logs endpoint: `${OTLP_ENDPOINT}/v1/logs`
9999
- Metrics endpoint: `${OTLP_ENDPOINT}/v1/metrics`
100-
- **Protocol**: HTTP/protobuf for OTLP communication
101-
- **Metrics level**: Set to **detailed** for comprehensive monitoring
102-
- **Logs level**: Set to **debug** for detailed troubleshooting information
100+
- **Protocol**. HTTP/protobuf for OTLP communication
101+
- **Metrics level**. Set to **detailed** for comprehensive monitoring
102+
- **Logs level**. Set to **debug** for detailed troubleshooting information
103103

104104
```yaml
105105
service:
@@ -215,11 +215,11 @@ import LogsOutro from '../../../reuse/apps/opentelemetry/send-logs-outro.md';
215215
### Validation
216216

217217
After installation, verify that:
218-
1. The OTEL Collector service is running
219-
2. The configured base endpoint is reachable
220-
3. Data is being successfully sent to both the logs (`/v1/logs`) and metrics (`/v1/metrics`) endpoints
221-
4. Resource attributes are properly applied to the telemetry data
222-
5. Internal metrics are accessible at `http://localhost:8888/metrics`
218+
1. The OTEL Collector service is running.
219+
2. The configured base endpoint is reachable.
220+
3. Data is being successfully sent to both the logs (`/v1/logs`) and metrics (`/v1/metrics`) endpoints.
221+
4. Resource attributes are properly applied to the telemetry data.
222+
5. Internal metrics are accessible at `http://localhost:8888/metrics`.
223223

224224
## Sample log messages
225225

@@ -325,10 +325,10 @@ All dashboards have a set of filters that you can apply to the entire dashboard.
325325
The **OpenTelemetry Collector Insights - Overview** dashboard provides a high-level view of your OpenTelemetry Collector fleet's health and performance. This is your starting point for monitoring collector instances.
326326

327327
Use this dashboard to:
328-
- Monitor the overall health of your collector fleet
329-
- Identify performance bottlenecks and resource constraints
330-
- Track data flow and processing rates across collectors
331-
- Quickly spot collectors experiencing issues
328+
- Monitor the overall health of your collector fleet.
329+
- Identify performance bottlenecks and resource constraints.
330+
- Track data flow and processing rates across collectors.
331+
- Quickly spot collectors experiencing issues.
332332

333333
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/OpenTelemetry-Collector-Insights/OpenTelemetry-Collector-Overview.png' alt="Overview" />
334334

@@ -337,10 +337,10 @@ Use this dashboard to:
337337
The **OpenTelemetry Collector Insights - Logs** dashboard provides detailed insights into collector log output for root-cause analysis of errors, data dropping events, and restarts.
338338

339339
Use this dashboard to:
340-
- Analyze error patterns and troubleshoot issues
341-
- Monitor collector startup and shutdown events
342-
- Identify data loss or processing problems
343-
- Track log severity trends across your collector fleet
340+
- Analyze error patterns and troubleshoot issues.
341+
- Monitor collector startup and shutdown events.
342+
- Identify data loss or processing problems.
343+
- Track log severity trends across your collector fleet.
344344

345345
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/OpenTelemetry-Collector-Insights/OpenTelemetry-Collector-Logs.png' alt="Logs" />
346346

@@ -349,10 +349,10 @@ Use this dashboard to:
349349
The **OpenTelemetry Collector Insights - Pipeline: Receiver Health** dashboard focuses exclusively on the data ingestion stage of the pipeline to monitor data sources and receiver performance.
350350

351351
Use this dashboard to:
352-
- Monitor receiver performance and data ingestion rates
353-
- Identify issues with data sources and input connections
354-
- Track receiver-specific errors and failures
355-
- Analyze accepted vs refused data points
352+
- Monitor receiver performance and data ingestion rates.
353+
- Identify issues with data sources and input connections.
354+
- Track receiver-specific errors and failures.
355+
- Analyze accepted vs refused data points.
356356

357357
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/OpenTelemetry-Collector-Insights/OpenTelemetry-Collector-Pipeline-Receiver-Health.png' alt="Pipeline Receiver Health" />
358358

@@ -361,10 +361,10 @@ Use this dashboard to:
361361
The **OpenTelemetry Collector Insights - Pipeline: Processor Health** dashboard is crucial for understanding if any processors (like batch, memory_limiter, or resourcedetection) are dropping data or causing performance issues.
362362

363363
Use this dashboard to:
364-
- Monitor processor performance and throughput
365-
- Identify data drops or processing bottlenecks
366-
- Track processor-specific configurations and health
367-
- Analyze batch processing efficiency and triggers
364+
- Monitor processor performance and throughput.
365+
- Identify data drops or processing bottlenecks.
366+
- Track processor-specific configurations and health.
367+
- Analyze batch processing efficiency and triggers.
368368

369369
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/OpenTelemetry-Collector-Insights/OpenTelemetry-Collector-Pipeline-Processor-Health.png' alt="Pipeline Processor Health" />
370370

@@ -373,10 +373,10 @@ Use this dashboard to:
373373
The **OpenTelemetry Collector Insights - Pipeline: Exporter Health** dashboard is the most critical dashboard for diagnosing backpressure and data loss at the egress stage of the pipeline.
374374

375375
Use this dashboard to:
376-
- Monitor exporter performance and success rates
377-
- Identify backpressure issues and export failures
378-
- Track data delivery to downstream systems
379-
- Analyze queue utilization and capacity
376+
- Monitor exporter performance and success rates.
377+
- Identify backpressure issues and export failures.
378+
- Track data delivery to downstream systems.
379+
- Analyze queue utilization and capacity.
380380

381381
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/OpenTelemetry-Collector-Insights/OpenTelemetry-Collector-Pipeline-Exporter-Health.png' alt="Pipeline Exporter Health" />
382382

@@ -385,18 +385,20 @@ Use this dashboard to:
385385
The **OpenTelemetry Collector Insights - Resource Utilization** dashboard provides a deep dive into the collector's own resource consumption to diagnose performance issues and plan for capacity.
386386

387387
Use this dashboard to:
388-
- Monitor CPU, memory, and disk usage by collectors
389-
- Plan capacity and resource allocation
390-
- Identify resource constraints and optimization opportunities
391-
- Track heap allocation and garbage collection patterns
388+
- Monitor CPU, memory, and disk usage by collectors.
389+
- Plan capacity and resource allocation.
390+
- Identify resource constraints and optimization opportunities.
391+
- Track heap allocation and garbage collection patterns.
392392

393393
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/OpenTelemetry-Collector-Insights/OpenTelemetry-Collector-Resource-Utilization.png' alt="Resource Utilization" />
394394

395395
## Troubleshooting
396396

397-
### Common Issues
397+
### Common issues
398398

399-
**Collector connection failure**: If your collector fails to connect to Sumo Logic, you may need to configure proxy settings. Check the collector's logs for connection errors:
399+
##### Collector connection failure
400+
401+
If your collector fails to connect to Sumo Logic, you may need to configure proxy settings. Check the collector's logs for connection errors:
400402

401403
```bash
402404
# On systemd systems
@@ -405,31 +407,35 @@ journalctl --unit otelcol-sumo
405407
# Look for errors like "Unable to get a heartbeat"
406408
```
407409

408-
**High queue utilization**: Monitor the `otelcol_exporter_queue_size` and `otelcol_exporter_queue_capacity` metrics. If the queue is consistently full, you may need to:
410+
##### High queue utilization
411+
412+
Monitor the `otelcol_exporter_queue_size` and `otelcol_exporter_queue_capacity` metrics. If the queue is consistently full, you may need to:
409413
- Reduce data ingestion rate
410414
- Increase queue capacity
411415
- Scale horizontally with more collectors
412416

413-
**Data dropping**: Watch for logs containing "Dropping data because sending_queue is full" and monitor failed enqueue metrics:
417+
##### Data dropping
418+
419+
Watch for logs containing "Dropping data because sending_queue is full" and monitor failed enqueue metrics:
414420
- `otelcol_exporter_enqueue_failed_spans`
415421
- `otelcol_exporter_enqueue_failed_metric_points`
416422
- `otelcol_exporter_enqueue_failed_log_records`
417423

418-
### Accessing Collector Metrics Directly
424+
### Accessing collector metrics directly
419425

420426
By default, the collector's internal metrics are available in Prometheus format at `http://localhost:8888/metrics`. You can access them using:
421427

422428
```bash
423429
curl http://localhost:8888/metrics
424430
```
425431

426-
### Log Levels and Configuration
432+
### Log levels and configuration
427433

428434
Configure different log levels for troubleshooting:
429-
- **DEBUG**: Most verbose, includes detailed trace information
430-
- **INFO**: Standard operational information (default)
431-
- **WARN**: Warning messages about potential issues
432-
- **ERROR**: Error conditions that need attention
435+
- **DEBUG**. Most verbose, includes detailed trace information
436+
- **INFO**. Standard operational information (default)
437+
- **WARN**. Warning messages about potential issues
438+
- **ERROR**. Error conditions that need attention
433439

434440
## Create monitors for OpenTelemetry Collector Insights app
435441

0 commit comments

Comments
 (0)