Skip to content
This repository was archived by the owner on Sep 2, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions _includes/requirements/realm.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
The Browser RUM agent runs in the following realms:

- United States: ``us0``, ``us1``, ``us2``
- Europe: ``eu0``, ``eu1``, ``eu2``
- Asia-Pacific: ``au0``, ``jp0``
3 changes: 2 additions & 1 deletion gdi/get-data-in/connect/aws/aws-prereqs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -426,7 +426,8 @@ Read more at the official AWS documentation:

* :new-page:`AWS Organization Service Control Policies <https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps.html>`
* :new-page:`Permissions boundaries for IAM entities <https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_boundaries.html>`
* :new-page:`Troubleshooting IAM permission access denied or unauthorized errors <https://repost.aws/knowledge-center/troubleshoot-iam-permission-errors>`

.. tip:: Search for specific troubleshooting at AWS' knowledge center.

.. _aws-regions:

Expand Down
5 changes: 4 additions & 1 deletion gdi/get-data-in/rum/browser/install-rum-browser.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ Check compatibility and requirements
.. include:: /_includes/requirements/browser.rst


.. include:: /_includes/requirements/realm.rst


.. _rum-browser-install:

Instrument your web application for Splunk RUM
Expand Down Expand Up @@ -194,7 +197,7 @@ Follow these steps to instrument and configure Splunk RUM using npm:

.. _loading-initializing_browser-rum:

Loading and initializing the Browser RUM agent
Load and initialize the Browser RUM agent
========================================================

To avoid gaps in your data, load and initialize the Browser RUM agent synchronously and as early as possible. Delayed loading might result in missing data, as the instrumentation cannot collect data before it's initialized.
Expand Down
2 changes: 1 addition & 1 deletion gdi/opentelemetry/exposed-endpoints.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ See the table for a complete list of exposed ports and endpoints:
* - ``http(s)://0.0.0.0:7276``
- SAPM trace receiver
* - ``http://localhost:8888/metrics``
- :new-page:`Internal Prometheus metrics <https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/monitoring.md>`
- :new-page:`Internal Prometheus metrics <https://opentelemetry.io/docs/collector/internal-telemetry>`
* - ``http(s)://localhost:8006``
- Fluent forward receiver
* - ``http(s)://0.0.0.0:9080``
Expand Down
163 changes: 83 additions & 80 deletions gdi/opentelemetry/splunk-collector-troubleshooting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,16 @@ Troubleshoot the Splunk OpenTelemetry Collector

See the following issues and workarounds for the Splunk Distribution of the OpenTelemetry Collector.

.. note:: See also the :new-page:`OpenTelemetry Project troublehooting docs in GitHub <https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/troubleshooting.md>`.
.. note:: See also the :new-page:`OpenTelemetry Project troublehooting docs <https://opentelemetry.io/docs/collector/troubleshooting>`.

Collector isn't behaving as expected
=========================================
.. caution:: Splunk only provides best-effort support for the upstream OpenTelemetry Collector.

The Collector isn't behaving as expected
=================================================

The Collector might experience the issues described in this section.

Collector or td-agent service isn't working
The Collector or td-agent service isn't working
--------------------------------------------------

If either the Collector or td-agent services are not installed and configured, check these things to fix the issue:
Expand All @@ -26,64 +28,75 @@ If either the Collector or td-agent services are not installed and configured, c
* Check that your platform is not running in a containerized environment
* Check the installation logs for more details

Collector exits or restarts
The Collector exits or restarts
-----------------------------------------

The collector might exit or restart for the following reasons:
The Collector might exit or restart for the following reasons:

* Memory pressure due to a missing or misconfigured ``memory_limiter`` processor
* Improperly sized for load
* Improperly configured. For example, a queue size configured higher than available memory
* Infrastructure resource limits. For example, Kubernetes

Restart the Splunk Distribution of OpenTelemetry Collector and check the configuration.
Restart the Splunk Distribution of the OpenTelemetry Collector and check the configuration.

Collector doesn't start in Windows Docker containers
The Collector doesn't start in Windows Docker containers
-----------------------------------------------------------

The process might fail to start in a custom built, Windows-based Docker container, resulting in a "The service process could not connect to the service controller" error message.

In this case, the ``NO_WINDOWS_SERVICE=1`` environment variable must be set to force the Splunk Distribution of OpenTelemetry Collector to start as if it were running in an interactive terminal, without attempting to run as a Windows service.

Collector is experiencing data issues
Extract a running configuration
=========================================

Extracting a running configuration saves or stores the contents of a configuration file to logs that you can use to troubleshoot issues. You can extract a running configuration by accessing these ports:

* ``http://localhost:55554/debug/configz/initial``
* ``http://localhost:55554/debug/configz/effective``

For Linux, the support bundle script captures this information. See :ref:`otel-install-linux` for the installer script. This capability is primarily useful if you are using remote configuration options such as Zookeeper where the startup configuration can change during operation.

The Collector is experiencing data issues
============================================

You can monitor internal Collector metrics tracking parameters such as data loss or CPU resources in Splunk Observability Cloud's default dashboards at :guilabel:`Dashboards > OpenTelemetry Collector > OpenTelemetry Collector`. To learn more about these metrics, see :new-page:`Monitoring <https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/monitoring.md>` in the OpenTelemetry GitHub repo.
You can monitor internal Collector metrics tracking parameters such as data loss or CPU resources in Splunk Observability Cloud's default dashboards at :guilabel:`Dashboards > OpenTelemetry Collector > OpenTelemetry Collector`.

The Collector might experience the issues described in this section.
To learn more see:

* :ref:`metrics-internal-collector`
* :new-page:`Internal telemetry <https://opentelemetry.io/docs/collector/internal-telemetry>` in the OpenTelemetry project documentation

Collector is dropping data
The Collector is dropping data
--------------------------------

Data might drop for a variety of reasons, but most commonly for the following reasons:

* The collector is improperly sized, resulting in the Splunk Distribution of OpenTelemetry Collector being unable to process and export the data as fast as it is received. See :ref:`otel-sizing` for sizing guidelines.
* The Collector is improperly sized, resulting in the Splunk Distribution of the OpenTelemetry Collector being unable to process and export the data as fast as it is received. See :ref:`otel-sizing` for sizing guidelines.
* The exporter destination is unavailable or accepting the data too slowly. To mitigate drops, configure the ``batch`` processor. In addition, you might also need to configure the queued retry options on activated exporters.

Collector isn't receiving data
The Collector isn't receiving data
-------------------------------------

The collector might not receive data for the following reasons:
The Collector might not receive data for the following reasons:

* Network configuration issues
* Receiver configuration issues
* The receiver is defined in the receivers section, but not activated in any pipelines
* The client configuration is incorrect

Check the logs and :new-page:`Troubleshooting zPages <https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/troubleshooting.md#zpages>` in the OpenTelemetry project GitHub repositories for more information. Note that Splunk only provides best-effort support for the upstream OpenTelemetry Collector.

Collector can't process data
The Collector can't process data
-----------------------------------

The collector might not process data for the following reasons:
The Collector might not process data for the following reasons:

* The attributes processors work only for "tags" on spans. The span name is handled by the span processor.
* Processors for trace data (except tail sampling) only work on individual spans. Make sure your collector is configured properly.

Collector can't export data
The Collector can't export data
------------------------------------

The collector might be unable to export data for the following reasons:
The Collector might be unable to export data for the following reasons:

* Network configuration issues, such as firewall, DNS, or proxy support
* Incorrect exporter configuration
Expand All @@ -92,8 +105,6 @@ The collector might be unable to export data for the following reasons:

If you need to use a proxy, see :ref:`configure-proxy-collector`.

Check the logs and :new-page:`Troubleshooting zPages <https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/troubleshooting.md#zpages>` in the OpenTelemetry project GitHub repositories for more information. Note that Splunk only provides best-effort support for the upstream OpenTelemetry Collector.

.. _collector-gateway-metrics-issue:

Metrics and metadata not available in data forwarding (gateway) mode
Expand Down Expand Up @@ -149,15 +160,6 @@ For example:
value: staging
key: deployment.environment

Extract a running configuration
=========================================
Extracting a running configuration saves or stores the contents of a configuration file to logs that you can use to troubleshoot issues. You can extract a running configuration by accessing these ports:

* ``http://localhost:55554/debug/configz/initial``
* ``http://localhost:55554/debug/configz/effective``

For Linux, the support bundle script captures this information. See :ref:`otel-install-linux` for the installer script. This capability is primarily useful if you are using remote configuration options such as Zookeeper where the startup configuration can change during operation.

Check metric data from the command line
==============================================

Expand All @@ -168,56 +170,37 @@ To check whether host metrics are being collected and processed correctly, you c

You can then pipe the output to ``grep`` (Linux) or ``Select-String`` (Windows) to filter the data. For example, ``curl http://localhost:8888/metrics | grep service_instance_id`` retrieves the service instance ID.

You're getting a "bind: address already in use" error message
==================================================================================

If you see an error message such as "bind: address already in use", another resource is already using the port that the current configuration requires. This resource could be another application, or a tracing tool such as Jaeger or Zipkin. You can modify the configuration to use another port.

You can modify any of these endpoints or ports:

* Receiver endpoint
* Extensions endpoint
* Metrics address (if port 8888)

Conflicts with port 8888
-----------------------------------

If you encounter a conflict with port 8888, you will need to change to port 8889, making adjustments in these two areas:

1. Add telemetry configuration under the service section:
Trace collection issues
================================

.. code-block:: yaml
Test the Collector by sending synthetic data
------------------------------------------------------------

You can test the Collector to make sure it can receive spans without instrumenting an application. By default, the Collector activates the Zipkin receiver, which is capable of receiving trace data over JSON.

service:
telemetry:
metrics:
address: ":8889"
To test the UI, you can submit a POST request or paste JSON in this directory, as shown in the following example.

.. code-block:: bash

2. Update the port for ``receivers.prometheus/internal`` from 8888 to 8889:
curl -OL https://raw.githubusercontent.com/openzipkin/zipkin/master/zipkin-lens/testdata/yelp.json
curl -X POST localhost:9411/api/v2/spans -H'Content-Type: application/json' -d @yelp.json

.. code-block:: yaml
.. note::

Update the ``localhost`` field as appropriate to reach the Collector.

receivers:
prometheus/internal:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 10s
static_configs:
- targets: ['0.0.0.0:8889']
No response means the request was sent successfully. You can also pass ``-v`` to the curl command to confirm.

If you see this error message on Kubernetes and you're using Helm charts, modify the configuration by updating the chart values for both configuration and exposed ports.
Error codes and messages
==================================================================================

You're getting a "pattern not matched" error message
==================================================================================
------------------------------------------------------------

If you see an error message such as "pattern not matched", this message is from Fluentd, and means that the ``<parser>`` was unable to match based on the log message. As a result, the log message is not collected. Check the Fluentd configuration and update as required.

You're receiving an HTTP error code
==================================================================================
------------------------------------------------------------

If an HTTP request is not successfully completed, you might see the following HTTP error codes.

Expand All @@ -236,25 +219,45 @@ If an HTTP request is not successfully completed, you might see the following HT
* - ``503 (SERVICE UNAVAILABLE)``
- Check the status page.

Trace collection issues
================================
You're getting a "bind: address already in use" error message
------------------------------------------------------------------------------------------------------------------------

Here are some common issues related to trace collection on the Collector.
If you see an error message such as "bind: address already in use", another resource is already using the port that the current configuration requires. This resource could be another application, or a tracing tool such as Jaeger or Zipkin. You can modify the configuration to use another port.

Test the Collector by sending synthetic data
------------------------------------------------------------
You can modify any of these endpoints or ports:

You can test the Collector to make sure it can receive spans without instrumenting an application. By default, the Collector activates the Zipkin receiver, which is capable of receiving trace data over JSON.
* Receiver endpoint
* Extensions endpoint
* Metrics address (if port 8888)

To test the UI, you can submit a POST request or paste JSON in this directory, as shown in the following example.
Conflicts with port 8888
-----------------------------------

.. code-block:: bash
If you encounter a conflict with port 8888, you will need to change to port 8889, making adjustments in these two areas:

curl -OL https://raw.githubusercontent.com/openzipkin/zipkin/master/zipkin-lens/testdata/yelp.json
curl -X POST localhost:9411/api/v2/spans -H'Content-Type: application/json' -d @yelp.json
1. Add telemetry configuration under the service section:

.. note::
.. code-block:: yaml

Update the ``localhost`` field as appropriate to reach the Collector.

No response means the request was sent successfully. You can also pass ``-v`` to the curl command to confirm.
service:
telemetry:
metrics:
address: ":8889"


2. Update the port for ``receivers.prometheus/internal`` from 8888 to 8889:

.. code-block:: yaml


receivers:
prometheus/internal:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 10s
static_configs:
- targets: ['0.0.0.0:8889']

If you see this error message on Kubernetes and you're using Helm charts, modify the configuration by updating the chart values for both configuration and exposed ports.
2 changes: 1 addition & 1 deletion gdi/opentelemetry/troubleshoot-logs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Troubleshoot Collector logs
:description: Describes known issues when collecting logs with the Splunk Distribution of OpenTelemetry Collector.


.. note:: To activate the Collector's debug logging, see the :new-page:`OpenTelemetry project documentation in GitHub <https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/troubleshooting.md#logs>`.
.. note:: See also the :new-page:`OpenTelemetry Project troublehooting docs <https://opentelemetry.io/docs/collector/troubleshooting>` for more information about debugging.

Here are some common issues related to log collection on the Collector.

Expand Down
2 changes: 1 addition & 1 deletion sp-oncall/user-roles/user-roles-permissions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ The following table identifies the team management capabilities of each Splunk O
- Yes
-

* - Create, edit, or delete escation policies
* - Create, edit, or delete escalation policies
- Yes
-
- For teams where they are a Team admin
Expand Down
Loading