Skip to content
This repository was archived by the owner on Sep 2, 2025. It is now read-only.

Commit 815abf6

Browse files
Merge pull request #1600 from splunk/repo-sync
Pulling refs/heads/main into main
2 parents 8443062 + 5f8f82d commit 815abf6

File tree

4 files changed

+87
-83
lines changed

4 files changed

+87
-83
lines changed

gdi/get-data-in/connect/aws/aws-prereqs.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -426,7 +426,8 @@ Read more at the official AWS documentation:
426426

427427
* :new-page:`AWS Organization Service Control Policies <https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps.html>`
428428
* :new-page:`Permissions boundaries for IAM entities <https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_boundaries.html>`
429-
* :new-page:`Troubleshooting IAM permission access denied or unauthorized errors <https://repost.aws/knowledge-center/troubleshoot-iam-permission-errors>`
429+
430+
.. tip:: Search for specific troubleshooting at AWS' knowledge center.
430431

431432
.. _aws-regions:
432433

gdi/opentelemetry/exposed-endpoints.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ See the table for a complete list of exposed ports and endpoints:
3535
* - ``http(s)://0.0.0.0:7276``
3636
- SAPM trace receiver
3737
* - ``http://localhost:8888/metrics``
38-
- :new-page:`Internal Prometheus metrics <https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/monitoring.md>`
38+
- :new-page:`Internal Prometheus metrics <https://opentelemetry.io/docs/collector/internal-telemetry>`
3939
* - ``http(s)://localhost:8006``
4040
- Fluent forward receiver
4141
* - ``http(s)://0.0.0.0:9080``

gdi/opentelemetry/splunk-collector-troubleshooting.rst

Lines changed: 83 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,16 @@ Troubleshoot the Splunk OpenTelemetry Collector
99

1010
See the following issues and workarounds for the Splunk Distribution of the OpenTelemetry Collector.
1111

12-
.. note:: See also the :new-page:`OpenTelemetry Project troublehooting docs in GitHub <https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/troubleshooting.md>`.
12+
.. note:: See also the :new-page:`OpenTelemetry Project troublehooting docs <https://opentelemetry.io/docs/collector/troubleshooting>`.
1313

14-
Collector isn't behaving as expected
15-
=========================================
14+
.. caution:: Splunk only provides best-effort support for the upstream OpenTelemetry Collector.
15+
16+
The Collector isn't behaving as expected
17+
=================================================
1618

1719
The Collector might experience the issues described in this section.
1820

19-
Collector or td-agent service isn't working
21+
The Collector or td-agent service isn't working
2022
--------------------------------------------------
2123

2224
If either the Collector or td-agent services are not installed and configured, check these things to fix the issue:
@@ -26,64 +28,75 @@ If either the Collector or td-agent services are not installed and configured, c
2628
* Check that your platform is not running in a containerized environment
2729
* Check the installation logs for more details
2830

29-
Collector exits or restarts
31+
The Collector exits or restarts
3032
-----------------------------------------
3133

32-
The collector might exit or restart for the following reasons:
34+
The Collector might exit or restart for the following reasons:
3335

3436
* Memory pressure due to a missing or misconfigured ``memory_limiter`` processor
3537
* Improperly sized for load
3638
* Improperly configured. For example, a queue size configured higher than available memory
3739
* Infrastructure resource limits. For example, Kubernetes
3840

39-
Restart the Splunk Distribution of OpenTelemetry Collector and check the configuration.
41+
Restart the Splunk Distribution of the OpenTelemetry Collector and check the configuration.
4042

41-
Collector doesn't start in Windows Docker containers
43+
The Collector doesn't start in Windows Docker containers
4244
-----------------------------------------------------------
4345

4446
The process might fail to start in a custom built, Windows-based Docker container, resulting in a "The service process could not connect to the service controller" error message.
4547

4648
In this case, the ``NO_WINDOWS_SERVICE=1`` environment variable must be set to force the Splunk Distribution of OpenTelemetry Collector to start as if it were running in an interactive terminal, without attempting to run as a Windows service.
4749

48-
Collector is experiencing data issues
50+
Extract a running configuration
51+
=========================================
52+
53+
Extracting a running configuration saves or stores the contents of a configuration file to logs that you can use to troubleshoot issues. You can extract a running configuration by accessing these ports:
54+
55+
* ``http://localhost:55554/debug/configz/initial``
56+
* ``http://localhost:55554/debug/configz/effective``
57+
58+
For Linux, the support bundle script captures this information. See :ref:`otel-install-linux` for the installer script. This capability is primarily useful if you are using remote configuration options such as Zookeeper where the startup configuration can change during operation.
59+
60+
The Collector is experiencing data issues
4961
============================================
5062

51-
You can monitor internal Collector metrics tracking parameters such as data loss or CPU resources in Splunk Observability Cloud's default dashboards at :guilabel:`Dashboards > OpenTelemetry Collector > OpenTelemetry Collector`. To learn more about these metrics, see :new-page:`Monitoring <https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/monitoring.md>` in the OpenTelemetry GitHub repo.
63+
You can monitor internal Collector metrics tracking parameters such as data loss or CPU resources in Splunk Observability Cloud's default dashboards at :guilabel:`Dashboards > OpenTelemetry Collector > OpenTelemetry Collector`.
5264

53-
The Collector might experience the issues described in this section.
65+
To learn more see:
66+
67+
* :ref:`metrics-internal-collector`
68+
* :new-page:`Internal telemetry <https://opentelemetry.io/docs/collector/internal-telemetry>` in the OpenTelemetry project documentation
5469

55-
Collector is dropping data
70+
The Collector is dropping data
5671
--------------------------------
5772

5873
Data might drop for a variety of reasons, but most commonly for the following reasons:
5974

60-
* The collector is improperly sized, resulting in the Splunk Distribution of OpenTelemetry Collector being unable to process and export the data as fast as it is received. See :ref:`otel-sizing` for sizing guidelines.
75+
* The Collector is improperly sized, resulting in the Splunk Distribution of the OpenTelemetry Collector being unable to process and export the data as fast as it is received. See :ref:`otel-sizing` for sizing guidelines.
6176
* The exporter destination is unavailable or accepting the data too slowly. To mitigate drops, configure the ``batch`` processor. In addition, you might also need to configure the queued retry options on activated exporters.
6277

63-
Collector isn't receiving data
78+
The Collector isn't receiving data
6479
-------------------------------------
6580

66-
The collector might not receive data for the following reasons:
81+
The Collector might not receive data for the following reasons:
6782

6883
* Network configuration issues
6984
* Receiver configuration issues
7085
* The receiver is defined in the receivers section, but not activated in any pipelines
7186
* The client configuration is incorrect
7287

73-
Check the logs and :new-page:`Troubleshooting zPages <https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/troubleshooting.md#zpages>` in the OpenTelemetry project GitHub repositories for more information. Note that Splunk only provides best-effort support for the upstream OpenTelemetry Collector.
74-
75-
Collector can't process data
88+
The Collector can't process data
7689
-----------------------------------
7790

78-
The collector might not process data for the following reasons:
91+
The Collector might not process data for the following reasons:
7992

8093
* The attributes processors work only for "tags" on spans. The span name is handled by the span processor.
8194
* Processors for trace data (except tail sampling) only work on individual spans. Make sure your collector is configured properly.
8295

83-
Collector can't export data
96+
The Collector can't export data
8497
------------------------------------
8598

86-
The collector might be unable to export data for the following reasons:
99+
The Collector might be unable to export data for the following reasons:
87100

88101
* Network configuration issues, such as firewall, DNS, or proxy support
89102
* Incorrect exporter configuration
@@ -92,8 +105,6 @@ The collector might be unable to export data for the following reasons:
92105

93106
If you need to use a proxy, see :ref:`configure-proxy-collector`.
94107

95-
Check the logs and :new-page:`Troubleshooting zPages <https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/troubleshooting.md#zpages>` in the OpenTelemetry project GitHub repositories for more information. Note that Splunk only provides best-effort support for the upstream OpenTelemetry Collector.
96-
97108
.. _collector-gateway-metrics-issue:
98109

99110
Metrics and metadata not available in data forwarding (gateway) mode
@@ -149,15 +160,6 @@ For example:
149160
value: staging
150161
key: deployment.environment
151162
152-
Extract a running configuration
153-
=========================================
154-
Extracting a running configuration saves or stores the contents of a configuration file to logs that you can use to troubleshoot issues. You can extract a running configuration by accessing these ports:
155-
156-
* ``http://localhost:55554/debug/configz/initial``
157-
* ``http://localhost:55554/debug/configz/effective``
158-
159-
For Linux, the support bundle script captures this information. See :ref:`otel-install-linux` for the installer script. This capability is primarily useful if you are using remote configuration options such as Zookeeper where the startup configuration can change during operation.
160-
161163
Check metric data from the command line
162164
==============================================
163165

@@ -168,56 +170,37 @@ To check whether host metrics are being collected and processed correctly, you c
168170

169171
You can then pipe the output to ``grep`` (Linux) or ``Select-String`` (Windows) to filter the data. For example, ``curl http://localhost:8888/metrics | grep service_instance_id`` retrieves the service instance ID.
170172

171-
You're getting a "bind: address already in use" error message
172-
==================================================================================
173-
174-
If you see an error message such as "bind: address already in use", another resource is already using the port that the current configuration requires. This resource could be another application, or a tracing tool such as Jaeger or Zipkin. You can modify the configuration to use another port.
175-
176-
You can modify any of these endpoints or ports:
177-
178-
* Receiver endpoint
179-
* Extensions endpoint
180-
* Metrics address (if port 8888)
181-
182-
Conflicts with port 8888
183-
-----------------------------------
184-
185-
If you encounter a conflict with port 8888, you will need to change to port 8889, making adjustments in these two areas:
186-
187-
1. Add telemetry configuration under the service section:
173+
Trace collection issues
174+
================================
188175

189-
.. code-block:: yaml
176+
Test the Collector by sending synthetic data
177+
------------------------------------------------------------
190178

179+
You can test the Collector to make sure it can receive spans without instrumenting an application. By default, the Collector activates the Zipkin receiver, which is capable of receiving trace data over JSON.
191180

192-
service:
193-
telemetry:
194-
metrics:
195-
address: ":8889"
181+
To test the UI, you can submit a POST request or paste JSON in this directory, as shown in the following example.
196182

183+
.. code-block:: bash
197184
198-
2. Update the port for ``receivers.prometheus/internal`` from 8888 to 8889:
185+
curl -OL https://raw.githubusercontent.com/openzipkin/zipkin/master/zipkin-lens/testdata/yelp.json
186+
curl -X POST localhost:9411/api/v2/spans -H'Content-Type: application/json' -d @yelp.json
199187
200-
.. code-block:: yaml
188+
.. note::
201189

190+
Update the ``localhost`` field as appropriate to reach the Collector.
202191

203-
receivers:
204-
prometheus/internal:
205-
config:
206-
scrape_configs:
207-
- job_name: 'otel-collector'
208-
scrape_interval: 10s
209-
static_configs:
210-
- targets: ['0.0.0.0:8889']
192+
No response means the request was sent successfully. You can also pass ``-v`` to the curl command to confirm.
211193

212-
If you see this error message on Kubernetes and you're using Helm charts, modify the configuration by updating the chart values for both configuration and exposed ports.
194+
Error codes and messages
195+
==================================================================================
213196

214197
You're getting a "pattern not matched" error message
215-
==================================================================================
198+
------------------------------------------------------------
216199

217200
If you see an error message such as "pattern not matched", this message is from Fluentd, and means that the ``<parser>`` was unable to match based on the log message. As a result, the log message is not collected. Check the Fluentd configuration and update as required.
218201

219202
You're receiving an HTTP error code
220-
==================================================================================
203+
------------------------------------------------------------
221204

222205
If an HTTP request is not successfully completed, you might see the following HTTP error codes.
223206

@@ -236,25 +219,45 @@ If an HTTP request is not successfully completed, you might see the following HT
236219
* - ``503 (SERVICE UNAVAILABLE)``
237220
- Check the status page.
238221

239-
Trace collection issues
240-
================================
222+
You're getting a "bind: address already in use" error message
223+
------------------------------------------------------------------------------------------------------------------------
241224

242-
Here are some common issues related to trace collection on the Collector.
225+
If you see an error message such as "bind: address already in use", another resource is already using the port that the current configuration requires. This resource could be another application, or a tracing tool such as Jaeger or Zipkin. You can modify the configuration to use another port.
243226

244-
Test the Collector by sending synthetic data
245-
------------------------------------------------------------
227+
You can modify any of these endpoints or ports:
246228

247-
You can test the Collector to make sure it can receive spans without instrumenting an application. By default, the Collector activates the Zipkin receiver, which is capable of receiving trace data over JSON.
229+
* Receiver endpoint
230+
* Extensions endpoint
231+
* Metrics address (if port 8888)
248232

249-
To test the UI, you can submit a POST request or paste JSON in this directory, as shown in the following example.
233+
Conflicts with port 8888
234+
-----------------------------------
250235

251-
.. code-block:: bash
236+
If you encounter a conflict with port 8888, you will need to change to port 8889, making adjustments in these two areas:
252237

253-
curl -OL https://raw.githubusercontent.com/openzipkin/zipkin/master/zipkin-lens/testdata/yelp.json
254-
curl -X POST localhost:9411/api/v2/spans -H'Content-Type: application/json' -d @yelp.json
238+
1. Add telemetry configuration under the service section:
255239

256-
.. note::
240+
.. code-block:: yaml
257241
258-
Update the ``localhost`` field as appropriate to reach the Collector.
259242
260-
No response means the request was sent successfully. You can also pass ``-v`` to the curl command to confirm.
243+
service:
244+
telemetry:
245+
metrics:
246+
address: ":8889"
247+
248+
249+
2. Update the port for ``receivers.prometheus/internal`` from 8888 to 8889:
250+
251+
.. code-block:: yaml
252+
253+
254+
receivers:
255+
prometheus/internal:
256+
config:
257+
scrape_configs:
258+
- job_name: 'otel-collector'
259+
scrape_interval: 10s
260+
static_configs:
261+
- targets: ['0.0.0.0:8889']
262+
263+
If you see this error message on Kubernetes and you're using Helm charts, modify the configuration by updating the chart values for both configuration and exposed ports.

gdi/opentelemetry/troubleshoot-logs.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Troubleshoot Collector logs
88
:description: Describes known issues when collecting logs with the Splunk Distribution of OpenTelemetry Collector.
99

1010

11-
.. note:: To activate the Collector's debug logging, see the :new-page:`OpenTelemetry project documentation in GitHub <https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/troubleshooting.md#logs>`.
11+
.. note:: See also the :new-page:`OpenTelemetry Project troublehooting docs <https://opentelemetry.io/docs/collector/troubleshooting>` for more information about debugging.
1212

1313
Here are some common issues related to log collection on the Collector.
1414

0 commit comments

Comments
 (0)