diff --git a/docset.yml b/docset.yml index c3765cf861..2475eaa1bc 100644 --- a/docset.yml +++ b/docset.yml @@ -37,6 +37,11 @@ cross_links: - ecs-logging-ruby - eland - elastic-serverless-forwarder + - elastic-otel-dotnet + - elastic-otel-java + - elastic-otel-node + - elastic-otel-php + - elastic-otel-python - elasticsearch - elasticsearch-hadoop - elasticsearch-java @@ -78,6 +83,8 @@ subs: ece: "Elastic Cloud Enterprise" eck: "Elastic Cloud on Kubernetes" edot: "Elastic Distribution of OpenTelemetry" + motlp: "Elastic Cloud Managed OTLP Endpoint" + edot-cf: "EDOT Cloud Forwarder" serverless-full: "Elastic Cloud Serverless" serverless-short: "Serverless" es-serverless: "Elasticsearch Serverless" diff --git a/troubleshoot/ingest/opentelemetry/edot-collector/collector-oomkilled.md b/troubleshoot/ingest/opentelemetry/edot-collector/collector-oomkilled.md new file mode 100644 index 0000000000..d5f93cb0e8 --- /dev/null +++ b/troubleshoot/ingest/opentelemetry/edot-collector/collector-oomkilled.md @@ -0,0 +1,112 @@ +--- +navigation_title: Collector out of memory +description: Diagnose and resolve out-of-memory issues in the EDOT Collector using Go’s Performance Profiler. +applies_to: + stack: + serverless: + observability: + product: + edot_collector: ga +products: + - id: cloud-serverless + - id: observability + - id: edot-collector +--- + +# Troubleshoot an out-of-memory EDOT Collector + +If your EDOT Collector pods terminate with an `OOMKilled` status, this usually indicates sustained memory pressure or potentially a memory leak due to an introduced regression or a bug. You can use the Performance Profiler (`pprof`) extension to collect and analyze memory profiles, helping you identify the root cause of the issue. + +## Symptoms + +These symptoms typically indicate that the EDOT Collector is experiencing a memory-related failure: + +- EDOT Collector pod restarts with an `OOMKilled` status in Kubernetes. +- Memory usage steadily increases before the crash. +- The Collector's logs don't show clear errors before termination. + +## Resolution + +Turn on runtime profiling using the `pprof` extension and then gather memory heap profiles from the affected pod: + +::::::{stepper} + +:::::{step} Enable `pprof` in the Collector + +Edit the EDOT Collector Daemonset configuration and include the `pprof` extension: + +```yaml +exporters: + ... +processors: + ... +receivers: + ... +extensions: + pprof: + +service: + extensions: + - pprof + - ... + pipelines: + metrics: + receivers: [ ... ] + processors: [ ... ] + exporters: [ ... ] +``` + +Restart the Collector after applying these changes. When the Daemonset is deployed again, spot the pod that is getting restarted. +::::: + +:::::{step} Access the affected pod and collect a heap dump + +When a pod starts exhibiting high memory usage or restarts due to OOM, run the following to enter a debug shell: + +```console +kubectl debug -it --image=ubuntu:latest +``` + +In the debug container: + +```console +apt update +apt install -y curl +curl http://localhost:1777/debug/pprof/heap > heap.out +``` +::::: + +:::::{step} Copy the heap file from the pod + +From your local machine, copy the heap file using: + +```bash +kubectl cp :heap.out ./heap.out -c +``` +::::{note} +Replace `` with the name assigned to the debug container. Without the `-c` flag, Kubernetes will show the list of available containers. +:::: +::::: + +:::::{step} Convert the heap profile for analysis + +You can now generate a visual representation, for example PNG: + +```bash +go tool pprof -png heap.out > heap.png +``` +::::: +:::::: + +## Best practices + +To improve the effectiveness of memory diagnostics and reduce investigation time, consider the following: + +- Collect multiple heap profiles over time (for example, every few minutes) to observe memory trends before the crash. + +- Automate heap profile collection at intervals to observe trends over time. + +## Resources + +- [Go's pprof documentation](https://pkg.go.dev/net/http/pprof) +- [OpenTelemetry Collector troubleshooting documentation](https://opentelemetry.io/docs/collector/troubleshooting/#performance-profiler-pprof) diff --git a/troubleshoot/ingest/opentelemetry/edot-collector/index.md b/troubleshoot/ingest/opentelemetry/edot-collector/index.md new file mode 100644 index 0000000000..e67c50d8f0 --- /dev/null +++ b/troubleshoot/ingest/opentelemetry/edot-collector/index.md @@ -0,0 +1,30 @@ +--- +navigation_title: EDOT Collector +description: Troubleshooting common issues with the EDOT Collector. +applies_to: + stack: + serverless: + observability: +products: + - id: cloud-serverless + - id: observability + - id: edot-collector +--- + +# Troubleshoot the EDOT Collector + +Perform these checks when troubleshooting common Collector issues: + +* Check logs: Review the Collector’s logs for error messages. +* Validate configuration: Use the `--dry-run` option to test configurations. +* Enable debug logging: Run the Collector with `--log-level=debug` for detailed logs. +* Check service status: Ensure the Collector is running with `systemctl status ` (Linux) or `tasklist` (Windows). +* Test connectivity: Use `telnet ` or `curl` to verify backend availability. +* Check open ports: Run netstat `-tulnp or lsof -i` to confirm the Collector is listening. +* Monitor resource usage: Use top/htop (Linux) or Task Manager (Windows) to check CPU & memory. +* Validate exporters: Ensure exporters are properly configured and reachable. +* Verify pipelines: Use `otelctl` diagnose (if available) to check pipeline health. +* Check permissions: Ensure the Collector has the right file and network permissions. +* Review recent changes: Roll back recent config updates if the issue started after changes. + +For in-depth details on troubleshooting refer to the [OpenTelemetry Collector troubleshooting documentation](https://opentelemetry.io/docs/collector/troubleshooting/). \ No newline at end of file diff --git a/troubleshoot/ingest/opentelemetry/edot-sdks/dotnet/index.md b/troubleshoot/ingest/opentelemetry/edot-sdks/dotnet/index.md new file mode 100644 index 0000000000..63ddf51355 --- /dev/null +++ b/troubleshoot/ingest/opentelemetry/edot-sdks/dotnet/index.md @@ -0,0 +1,197 @@ +--- +navigation_title: EDOT .NET +description: Use the information in this section to troubleshoot common problems affecting the {{edot}} .NET. +applies_to: + stack: + serverless: + observability: + product: + edot_dotnet: ga +products: + - id: cloud-serverless + - id: observability + - id: edot-sdk +--- + +# Troubleshooting the EDOT .NET SDK + +Use the information in this section to troubleshoot common problems. As a first step, make sure your stack is compatible with the [supported technologies](opentelemetry://reference/edot-sdks/dotnet/supported-technologies.md) for EDOT .NET and the OpenTelemetry SDK. + +If you have an Elastic support contract, create a ticket in the [Elastic Support portal](https://support.elastic.co/customers/s/login/). If you don't, post in the [APM discuss forum](https://discuss.elastic.co/c/apm) or [open a GitHub issue](https://github.com/elastic/elastic-otel-dotnet/issues). + +## Obtain EDOT .NET diagnostic logs + +For most problems, such as when you don't see data in your Elastic Observability backend, first check the EDOT .NET logs. These logs show initialization details and OpenTelemetry SDK events. If you don't see any warnings or errors in the EDOT .NET logs, switch the log level to `Trace` to investigate further. + +The {{edot}} .NET includes built-in diagnostic logging. You can direct logs to a file, STDOUT, or, in common scenarios, an `ILogger` instance. EDOT .NET also observes the built-in diagnostics events from the upstream OpenTelemetry SDK and includes those in its logging output. You can collect the log output and use it to diagnose issues locally during development or when working with Elastic support channels. + +## ASP.NET Core (generic host) logging integration + +When you build applications based on the generic host, such as those created by the [ASP.NET Core](https://learn.microsoft.com/aspnet/core/introduction-to-aspnet-core) and [worker service](https://learn.microsoft.com/dotnet/core/extensions/workers) templates, the {{edot}} .NET will try to automatically register with the built-in logging components when you use the `IHostApplicationBuilder.AddElasticOpenTelemetry` extension method to register EDOT .NET. + +```csharp +var builder = WebApplication.CreateBuilder(args); +builder.AddElasticOpenTelemetry(); +``` + +In this scenario, EDOT .NET tries to access an available `ILoggerFactory` and create an `ILogger`, logging to the event category `Elastic.OpenTelemetry`. EDOT .NET will register this as the additional logger for its diagnostics unless you have already configured a user-provided `ILogger`. This ensures that EDOT .NET and OpenTelemetry SDK logs are written for your application's configured logging providers. In ASP.NET Core, this includes the console logging provider and results in logs such as the following: + +``` +info: Elastic.OpenTelemetry[0] + Elastic Distribution of OpenTelemetry (EDOT) .NET: 1.0.0 +info: Elastic.OpenTelemetry[0] + EDOT log file: +info: Microsoft.Hosting.Lifetime[14] + Now listening on: https://localhost:7295 +info: Microsoft.Hosting.Lifetime[14] + Now listening on: http://localhost:5247 +info: Microsoft.Hosting.Lifetime[0] + Application started. Press Ctrl+C to shut down. +info: Microsoft.Hosting.Lifetime[0] + Hosting environment: Development +``` + +In the preceding log output, informational level logging is enabled as the default for this application. You can control the output by configuring the log levels. + +### Configuring the log level + +You can [configure](https://learn.microsoft.com/en-us/dotnet/core/extensions/logging?tabs=command-line#configure-logging) logs sent to the integrated `Microsoft.Extensions.Logging` library in several ways. A common choice is to use the `appsettings.json` file to configure log-level filters for specific categories. + +```json +{ + "Logging": { + "LogLevel": { + "Default": "Information", + "Microsoft.AspNetCore": "Warning", + "Elastic.OpenTelemetry": "Warning" + } + }, + "AllowedHosts": "*" +} +``` + +In the preceding code, you have filtered `Elastic.OpenTelemetry` to only emit log entries when they have the `Warning` log level or a higher severity. This overrides the `Default` configuration of `Information`. + +## Enable global file logging + +Integrated logging is helpful because it requires little to no setup. The logging infrastructure is not present by default in some application types, such as console applications. EDOT .NET also offers a global file logging feature, which is the easiest way for you to get diagnostics and debug information. You must enable file logging when you work with Elastic support, as trace logs will be requested. + +Specify at least one of the following environment variables to make sure that EDOT .NET logs into a file. + +`OTEL_LOG_LEVEL` _(optional)_: +Set the log level at which the profiler should log. Valid values are + +* trace +* debug +* information +* warning +* error +* none + +The default value is `information`. More verbose log levels like `trace` and `debug` can affect the runtime performance of profiler auto instrumentation, so use them _only_ for diagnostics purposes. + +:::{note} +If you don't explicitly set `ELASTIC_OTEL_LOG_TARGETS` to include `file`, global file logging will only be enabled when you configure it with `trace` or `debug`. +::: + +`OTEL_DOTNET_AUTO_LOG_DIRECTORY` _(optional)_: +Set the directory in which to write log files. If you don't set this, the default is: + +* `%USERPROFILE%\AppData\Roaming\elastic\elastic-otel-dotnet` on Windows +* `/var/log/elastic/elastic-otel-dotnet` on Linux +* `~/Library/Application Support/elastic/elastic-otel-dotnet` on OSX + +> ::::{important} +> Make sure the user account under which the profiler process runs has permission to write to the destination log directory. Specifically, when you run on IIS, ensure that the [AppPool identity](https://learn.microsoft.com/en-us/iis/manage/configuring-security/application-pool-identities) has write permissions in the target directory. +> :::: + +`ELASTIC_OTEL_LOG_TARGETS` _(optional)_: +A semi-colon separated list of targets for profiler logs. Valid values are + +* file +* stdout +* none + +The default value is `file` if you set `OTEL_DOTNET_AUTO_LOG_DIRECTORY` or set `OTEL_LOG_LEVEL` to `trace` or `debug`. + +## Advanced troubleshooting + +### Diagnosing initialization or bootstrap issues + +If EDOT for .NET fails before fully bootstrapping its internal components, it won't generate a log file. In such circumstances, you can provide an additional logger for diagnostic purposes. Alternatively, you can enable the `STDOUT` log target. + +#### Providing an additional application logger + +You can provide an additional `ILogger` that EDOT .NET will use to log pre-bootstrap events by creating an instance of `ElasticOpenTelemetryOptions`. + +```csharp +using Elastic.OpenTelemetry; +using Microsoft.Extensions.Logging; +using OpenTelemetry; + +using ILoggerFactory loggerFactory = LoggerFactory.Create(static builder => +{ + builder + .AddFilter("Elastic.OpenTelemetry", LogLevel.Trace) + .AddConsole(); +}); + +ILogger logger = loggerFactory.CreateLogger("EDOT"); + +var options = new ElasticOpenTelemetryOptions +{ + AdditionalLogger = logger +}; + +using var sdk = OpenTelemetrySdk.Create(builder => builder + .WithElasticDefaults(options)); +``` + +This example adds the console logging provider, but you can include any provider here. To use this sample code, add a dependency on the `Microsoft.Extensions.Logging.Console` [NuGet package](https://www.nuget.org/packages/microsoft.extensions.logging.console). + +You create and configure an `ILoggerFactory`. In this example, you configure the `Elastic.OpenTelemetry` category to capture trace logs, which is the most verbose option. This is the best choice when you diagnose initialization issues. + +You use the `ILoggerFactory` to create an `ILogger`, which you then assign to the `ElasticOpenTelemetryOptions.AdditionalLogger` property. Once you pass the `ElasticOpenTelemetryOptions` into the `WithElasticDefaults` method, the provided logger can capture bootstrap logs. + +To simplify the preceding code, you can also configure the `ElasticOpenTelemetryOptions` with an `ILoggerFactory` instance that EDOT .NET can use to create its own logger. + +```csharp +using var loggerFactory = LoggerFactory.Create(static builder => +{ + builder + .AddFilter("Elastic.OpenTelemetry", LogLevel.Debug) + .AddConsole(); +}); + +var options = new ElasticOpenTelemetryOptions +{ + AdditionalLoggerFactory = loggerFactory +}; + +using var sdk = OpenTelemetrySdk.Create(builder => builder + .WithElasticDefaults(options)); +``` + +## Known issues + +The following known issues affect EDOT .NET. + +### Missing log records + +The upstream SDK currently does not [comply with the spec](https://github.com/open-telemetry/opentelemetry-dotnet/issues/4324) regarding the deduplication of attributes when exporting log records. When you create a log within multiple scopes, each scope may store information using the same logical key. In this situation, the exported data will have duplicated attributes. + +You are most likely to see this when you log in the scope of a request and enable the `OpenTelemetryLoggerOptions.IncludeScopes` option. ASP.NET Core adds the `RequestId` to multiple scopes. We recommend that you don't enable `IncludeScopes` until the SDK fixes this. When you use the EDOT Collector or the [{{motlp}}](opentelemetry://reference/motlp.md) in serverless, non-compliant log records will fail to be ingested. + +EDOT .NET currently emits a warning if it detects that you use `IncludeScopes` in ASP.NET Core scenarios. + +This can also happen even when you set `IncludeScopes` to false. The following code will also result in duplicate attributes and the potential for lost log records. + +```csharp +Logger.LogInformation("Eat your {fruit} {fruit} {fruit}!", "apple", "banana", "mango"); +``` + +To avoid this scenario, make sure each placeholder uses a unique name. For example: + +```csharp +Logger.LogInformation("Eat your {fruit1} {fruit2} {fruit3}!", "apple", "banana", "mango"); +``` \ No newline at end of file diff --git a/troubleshoot/ingest/opentelemetry/edot-sdks/index.md b/troubleshoot/ingest/opentelemetry/edot-sdks/index.md new file mode 100644 index 0000000000..4889fb52e5 --- /dev/null +++ b/troubleshoot/ingest/opentelemetry/edot-sdks/index.md @@ -0,0 +1,22 @@ +--- +navigation_title: EDOT SDKs +description: Troubleshoot issues with the EDOT SDKs using these guides. +applies_to: + stack: + serverless: + observability: +products: + - id: cloud-serverless + - id: observability + - id: edot-sdk +--- + +# Troubleshooting the EDOT SDKs + +Find solutions to common issues with EDOT SDKs. + +- [.NET](/troubleshoot/ingest/opentelemetry/edot-sdks/dotnet/index.md) +- [Java](/troubleshoot/ingest/opentelemetry/edot-sdks/java/index.md) +- [Node.js](/troubleshoot/ingest/opentelemetry/edot-sdks/nodejs/index.md) +- [PHP](/troubleshoot/ingest/opentelemetry/edot-sdks/php/index.md) +- [Python](/troubleshoot/ingest/opentelemetry/edot-sdks/python/index.md) diff --git a/troubleshoot/ingest/opentelemetry/edot-sdks/java/index.md b/troubleshoot/ingest/opentelemetry/edot-sdks/java/index.md new file mode 100644 index 0000000000..767e7bd450 --- /dev/null +++ b/troubleshoot/ingest/opentelemetry/edot-sdks/java/index.md @@ -0,0 +1,165 @@ +--- +navigation_title: EDOT Java +description: Troubleshooting guide for the Elastic Distribution of OpenTelemetry (EDOT) Java Agent, covering connectivity, agent identification, and debugging. +applies_to: + stack: + serverless: + observability: + product: + edot_java: ga +products: + - id: cloud-serverless + - id: observability + - id: edot-sdk +--- + +# Troubleshooting the EDOT Java Agent + +Use the information in this section to troubleshoot common problems. As a first step, make sure your stack is compatible with the [supported technologies](opentelemetry://reference/edot-sdks/java/supported-technologies.md) for EDOT Java and the OpenTelemetry SDK. + +If you need help and you're an existing Elastic customer with a support contract, create a ticket in the [Elastic Support portal](https://support.elastic.co/customers/s/login/). Other users can post in the [APM discuss forum](https://discuss.elastic.co/c/apm) or [open a GitHub issue](https://github.com/elastic/elastic-otel-node/issues) + +## General troubleshooting + +Make you have set a service name, for example `-Dotel.service.name=Service1` or the environment variable `OTEL_SERVICE_NAME` set to `Service1`. Otherwise, by default the data will be sent to `unknown_service_java`. You may be getting data but it may all be under that service. + +## Connectivity to endpoint + +Check from the host, VM, pod, container, or image running the app that connectivity is available to the Collector. + +The following examples use a default URL, `http://127.0.0.1:4318/, which you should replace with the endpoint you are using: + +- OpenTelemetry or EDOT Collector without authentication: `curl -i http://127.0.0.1:4318/v1/traces -X POST -d '{}' -H content-type:application/json` +- OpenTelemetry or EDOT Collector with API key authentication: `curl -i http://127.0.0.1:4318/v1/traces -X POST -d '{}' -H content-type:application/json -H "Authorization:ApiKey "` + +The Collector should produce output similar to the following: + +``` +{"partialSuccess":{}} +``` + +## Agent troubleshooting + +Determine if the issue is related to the agent by following these steps: + +1. Start the application with no agent and see if the issue is not present. Observe if the issue is again present when restarting with the agent. +2. Check end-to-end connectivity without the agent by running one or more of the example apps in [elastic-otel-java](https://github.com/elastic/elastic-otel-java/blob/main/examples/troubleshooting/README.md). These use the OpenTelemetry SDK rather than the auto-instrumentation. They can confirm that the issue is specific to the Java agent or can otherwise identify that the issue is caused by something else. + +## Agent debug logging + +As debugging output is verbose and might produce noticeable overhead on the application, follow one of these strategies when you need logging: + +- In case of a technical issue or exception with the agent, use [agent debugging](#agent-debugging). +- If you need details on the captured data, use [per-signal debugging](#per-signal-debugging). + +In case of missing data, check first that the technology used in the application is supported in [upstream OpenTelemetry Java Instrumentation](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/supported-libraries.md) and in [EDOT Java](opentelemetry://reference/edot-sdks/java/supported-technologies.md). + +### Agent debugging + +To turn on agent debug logging you can either: + +- Set the `ELASTIC_OTEL_JAVAAGENT_LOG_LEVEL` environment variable or the `elastic.otel.javaagent.log_level` JVM system property to `debug`. +- Set the `OTEL_JAVAAGENT_DEBUG` environment variable or the `otel.javaagent.debug` JVM system property to `true` + +Both options require a JVM restart. + +The `otel.javaagent.debug` / `OTEL_JAVAAGENT_DEBUG` configuration options are inherited from the upstream +agent. Setting them to `true` also produce span information in plain text format. + +When `elastic.otel.javaagent.log_level` or `ELASTIC_OTEL_JAVAAGENT_LOG_LEVEL` are set to `debug`, the span information is included in JSON format. + +If only captured data details are needed, [per-signal debugging](#per-signal-debugging) is a lighter alternative. + +### Per-signal debugging + +Each supported signal can be logged independently. This allows limiting the amount of captured data and reducing the overhead compared +to [agent debugging](#agent-debugging). + +This is configured through the `OTEL_{SIGNAL}_EXPORTER` environment variable or `otel.{signal}.exporter` JVM system property +from the [OpenTelemetry SDK](https://opentelemetry.io/docs/languages/java/configuration/#properties-exporters) by adding any of the following exporters to the default `otlp` value: +- `otlp,logging-otlp`: JSON logging (recommended) +- `otlp,logging`: plain text logging + +Both options require a JVM restart. + +## Access or modification of application code + +The agent modifies the Java application binaries in bytecode form and does not requires original code nor recompiling or re-packaging the application. + +## How to deactivate the agent + +There are two ways to deactivate the instrumentation agent: + +- Remove the `-javaagent:` JVM argument. +- Set the `OTEL_JAVAAGENT_ENABLED` environment variable or the `otel.javaagent.enabled` Java system property to `false`. + +In both cases you need to restart the JVM. + +## Partial activation or deactivation of the agent + +You can partially deactivate the agent, or only selectively activate a limited set of instrumentations by following instructions in the [upstream documentation](https://opentelemetry.io/docs/zero-code/java/agent/disable/). + +## Check if EDOT is attached to a running JVM + +There are a few ways you can detect if the agent has been attached to a JVM: + +- In JVM logs, agent startup log message might be included. +- In JVM arguments, Run `ps -ef|grep javaagent`. +- In environment variables, for example `JAVA_TOOL_OPTIONS`. Check by inspecting the output of `export|grep javaagent`. + +## Identify the version of EDOT agent + +When the agent starts, a log message in the standard error provides the agent version: + +``` +INFO io.opentelemetry.javaagent.tooling.VersionLogger - opentelemetry-javaagent - version: 1.2.3 +``` + +In addition, the `-javaagent:` JVM argument can provide the path to the agent file name, which might also contain +the agent version, for example `-javaagent:elastic-otel-javaagent-1.2.3.jar`. + +Executing the agent jar as an application with `java -jar elastic-otel-javaagent.jar` provides the agent version on standard output, which could be relevant to use when the jar file has been renamed. + +Also, you can inspect the `Implementation-Version` entry in `META-INF/MANIFEST.MF` file of the agent jar, for example with `unzip -p elastic-otel-javaagent.jar META-INF/MANIFEST.MF|grep 'Implementation-Version'`. + +## Versions of the OpenTelemetry upstream dependencies + +Because EDOT Java is a distribution of [OpenTelemetry Java instrumentation](https://github.com/open-telemetry/opentelemetry-java-instrumentation), it includes the following dependencies: + +- [OpenTelemetry Java Instrumentation](https://github.com/open-telemetry/opentelemetry-java-instrumentation) +- [OpenTelemetry Java SDK](https://github.com/open-telemetry/opentelemetry-java) +- [Semantic Conventions Java mappings](https://github.com/open-telemetry/semantic-conventions-java) +- [OpenTelemetry Java Contrib](https://github.com/open-telemetry/opentelemetry-java-contrib) + +The versions of those included in EDOT are usually aligned with the OpenTelemetry Java Instrumentation. For reference, check the [EDOT Java release notes](elastic-otel-java://release-notes/index.md) details of versions included in each release. + +## When and how to update EDOT + +The general recommendation is to update EDOT agent to the latest version when possible to benefit from: + +- Bug fixes and technical improvements. +- Support of new features and instrumentation. +- Evolution of semantic conventions. +- Frequent and regular updates usually makes reviewing and handling changes easier. + +Updating to the latest EDOT version involves reviewing changes of the included dependencies: + +- [OpenTelemetry Java Instrumentation](https://github.com/open-telemetry/opentelemetry-java-instrumentation) +- [OpenTelemetry Java SDK](https://github.com/open-telemetry/opentelemetry-java) +- [Semantic Conventions Java mappings](https://github.com/open-telemetry/semantic-conventions-java) +- [OpenTelemetry Java Contrib](https://github.com/open-telemetry/opentelemetry-java-contrib) + +To review each of those individually, you can use the [EDOT Java release notes](elastic-otel-java://release-notes/index.md) for links to the respective versions of each component. + +### OpenTelemetry API/SDK update + +To implement manual instrumentation, some applications use the OpenTelemetry API and/or SDK which allows them to capture custom spans, metrics or even send data without any instrumentation agent. + +Updates of the OpenTelemetry API/SDK in the application and the EDOT Java agent can be done independently. + +- EDOT Java is backward-compatible with all previous versions of OpenTelemetry API/SDK. +- Using a more recent version of API/SDK than the one in EDOT should usually work without problem, however to ensure maximum compatibility keeping OpenTelemetry API/SDK version ≤ EDOT OpenTelemetry API/SDK version is recommended. + +### How to update + +Updating EDOT Java agent is done by replacing the agent binary `.jar` that has been [added during setup](opentelemetry://reference/edot-sdks/java/setup/index.md). diff --git a/troubleshoot/ingest/opentelemetry/edot-sdks/nodejs/index.md b/troubleshoot/ingest/opentelemetry/edot-sdks/nodejs/index.md new file mode 100644 index 0000000000..64ae7815ff --- /dev/null +++ b/troubleshoot/ingest/opentelemetry/edot-sdks/nodejs/index.md @@ -0,0 +1,134 @@ +--- +navigation_title: EDOT Node.js +description: Troubleshooting guide for the Elastic Distribution of OpenTelemetry Node.js (EDOT Node.js). +applies_to: + stack: + serverless: + observability: + product: + edot_node: ga +products: + - id: cloud-serverless + - id: observability + - id: edot-sdk +--- + +# Troubleshooting the EDOT Node.js SDK + +Use the information on this page to troubleshoot issues using EDOT Node.js. + +If you need help and you're an existing Elastic customer with a support contract, create a ticket in the [Elastic Support portal](https://support.elastic.co/customers/s/login/). Other users can post in the [APM discuss forum](https://discuss.elastic.co/c/apm) or [open a GitHub issue](https://github.com/elastic/elastic-otel-node/issues). + +As a first step, review the [supported technologies](opentelemetry://reference/edot-sdks/nodejs/supported-technologies.md) to ensure your application is supported by the SDK. Are you using a Node.js version that the SDK supports? Are the versions of your dependencies in the [supported version range](opentelemetry://reference/edot-sdks/nodejs/supported-technologies.md#instrumentations) to be instrumented? + +## Set a service name + +Make sure you have set a service name set using `OTEL_SERVICE_NAME=my-service` or through the `OTEL_RESOURCE_ATTRIBUTES=service.name=my-service` environment variables. Otherwisem by default the data is sent to the `unknown_service:node` service: you might be getting data but it might all be under that service. + +## Check connectivity + +Check from the host, VM, pod, container running your application, that connectivity is available to the Collector. Run the following command: + +```bash +curl -i $ELASTIC_OTLP_ENDPOINT \ + -X POST -d "{}" -H content-type:application/json \ + -H "Authorization: ApiKey $ELASTIC_API_KEY" +``` + +For example, if you [configured](opentelemetry://reference/edot-sdks/nodejs/configuration.md#basic-configuration) EDOT Node.js with: + +```bash +export OTEL_EXPORTER_OTLP_ENDPOINT="https://my-deployment-abc123.ingest.us-west-2.aws.elastic.cloud" +export OTEL_EXPORTER_OTLP_HEADERS="Authorization=ApiKey Zm9vO...mJhcg==" +... +``` + +Then you would run: + +```bash +curl -i https://my-deployment-abc123.ingest.us-west-2.aws.elastic.cloud \ + -X POST -d "{}" -H content-type:application/json \ + -H "Authorization: ApiKey Zm9vO...mJhcg==" +``` + +If that works correctly, you should expect to see output similar to the following: + +``` +HTTP/1.1 200 OK +Content-Type: application/json +Date: Thu, 27 Mar 2025 23:07:09 GMT +Content-Length: 21 + +{"partialSuccess":{}} +``` + + +## Deactivate the SDK + +If your application has an issue, but you are not sure if the EDOT Node.js SDK could be the cause, try deactivating the SDK. + +You can exclude the SDK by not starting it with your application by running the following command: + +```bash +node my-app.js # instead of 'node --import @elastic/opentelemetry-node my-app.js' +``` + +Or by setting the `OTEL_SDK_DISABLED` environment variable: + +```bash +export OTEL_SDK_DISABLED=true +node --import @elastic/opentelemetry-node my-app.js +``` + +## SDK diagnostic logs [sdk-diagnostic-logs] + +Turn on verbose diagnostic or debug logging from EDOT Node.js: + +1. Set the `OTEL_LOG_LEVEL` environment variable to `verbose`. +2. Restart your application, and reproduce the issue. If the issue is about not seeing telemetry that you expect to see, be sure to use your application so that telemetry data is generated. +3. Gather the full verbose log from application start until after the issue was reproduced. + +The start of the diagnostic log will look something like this: + +``` +% OTEL_LOG_LEVEL=verbose node --import @elastic/opentelemetry-node my-app.js +{"name":"elastic-otel-node","level":10,"msg":"import.mjs: registering module hook","time":"2025-03-27T23:29:12.075Z"} +{"name":"elastic-otel-node","level":10,"msg":"ElasticNodeSDK opts: {}","time":"2025-03-27T23:29:12.392Z"} +{"name":"elastic-otel-node","level":20,"msg":"@opentelemetry/api: Registered a global for diag v1.9.0.","time":"2025-03-27T23:29:12.392Z"} +{"name":"elastic-otel-node","level":20,"msg":"Enabling instrumentation \"@elastic/opentelemetry-instrumentation-openai\"","time":"2025-03-27T23:29:12.393Z"} +{"name":"elastic-otel-node","level":20,"msg":"Enabling instrumentation \"@opentelemetry/instrumentation-amqplib\"","time":"2025-03-27T23:29:12.394Z"} +{"name":"elastic-otel-node","level":20,"msg":"Enabling instrumentation \"@opentelemetry/instrumentation-aws-sdk\"","time":"2025-03-27T23:29:12.395Z"} +... +{"name":"elastic-otel-node","level":10,"msg":"Metrics exporter protocol set to http/protobuf","time":"2025-03-27T23:29:12.408Z"} +{"name":"elastic-otel-node","level":30,"preamble":true,"distroVersion":"0.7.0","env":{"os":"darwin 24.3.0","arch":"arm64","runtime":"Node.js v18.20.4"},"msg":"start Elastic Distribution of OpenTelemetry Node.js","time":"2025-03-27T23:29:12.409Z"} +... +``` + +Look for warnings (`"level":40`) or errors (`"level":50`) in the log output that might indicate an issue. + +## Deactivate an instrumentation + +To deactivate an instrumentation, set the [`OTEL_NODE_DISABLED_INSTRUMENTATIONS`](opentelemetry://reference/edot-sdks/nodejs/configuration.md#otel_node_disabledenabled_instrumentations-details) environment variable. + +For example, to deactivate `@opentelemetry/instrumentation-net` and `@opentelemetry/instrumentation-dns` run the following commands: + +```bash +export OTEL_NODE_DISABLED_INSTRUMENTATIONS=dns,net +... +node --import @elastic/opentelemetry-node my-app.js +``` + +## Check if EDOT Node.js is running + +Look for `start Elastic Distribution of OpenTelemetry Node.js` in the application log. + +As it is starting, EDOT Node.js always logs at the "info" level a preamble to indicate that it has started. For example: + +```json +{"name":"elastic-otel-node","level":30,"preamble":true,"distroVersion":"0.7.0","env":{"os":"darwin 24.3.0","arch":"arm64","runtime":"Node.js v18.20.4"},"msg":"start Elastic Distribution of OpenTelemetry Node.js","time":"2025-03-27T22:14:08.288Z"} +... +``` + +The `distroVersion` field also indicates which version of EDOT Node.js is being used. + + diff --git a/troubleshoot/ingest/opentelemetry/edot-sdks/php/index.md b/troubleshoot/ingest/opentelemetry/edot-sdks/php/index.md new file mode 100644 index 0000000000..b5c67b8469 --- /dev/null +++ b/troubleshoot/ingest/opentelemetry/edot-sdks/php/index.md @@ -0,0 +1,140 @@ +--- +navigation_title: EDOT PHP +description: Troubleshooting the Elastic Distribution of OpenTelemetry PHP agent. +applies_to: + stack: + serverless: + observability: + product: + edot_php: ga +products: + - id: cloud-serverless + - id: observability + - id: edot-sdk +--- + +# Troubleshooting the EDOT PHP agent + +Use the information on this page to troubleshoot issues using EDOT PHP. + +If you need help and you're an existing Elastic customer with a support contract, create a ticket in the [Elastic Support portal](https://support.elastic.co/customers/s/login/). Other users can post in the [APM discuss forum](https://discuss.elastic.co/c/apm) or [open a GitHub issue](https://github.com/elastic/elastic-otel-node/issues). + +As a first step, review the [supported technologies](opentelemetry://reference/edot-sdks/php/supported-technologies.md) to ensure your application is supported by the agent. Are you using a PHP version that EDOT PHP supports? Are the versions of your dependencies in the supported version range to be instrumented? + +## Turn on logging + +When diagnosing issues with the agent's operation, logs play a key role. You can find a detailed explanation of the logging configuration options in [Configuration](opentelemetry://reference/edot-sdks/php/configuration.md#logging-configuration). + +In most cases, setting the logging level to `debug` is sufficient. You can also use `trace` can be used, but keep in mind that the amount of generated data might be significant. + +Additionally, turn on logging for OpenTelemetry components, for example as shown in the following example . Logs from OpenTelemetry components are directed to the same output configured for EDOT logs. + +``` +export OTEL_LOG_LEVEL=DEBUG +``` + +:::{note} +Upload your complete debug logs to a service like [GitHub Gist](https://gist.github.com) so that Elastic support can analyze the problem. Logs should include everything from when the application starts up until the first request executes. Logs might contain sensitive data: make sure to review and sanitize them before sharing. +::: + + +## Turn off the agent + +If you suspect that the agent might be causing disruptions to a production application, you can deactivate the agent while you troubleshoot. + +To deactivate the agent, set the [`elastic_otel.enabled`](opentelemetry://reference/edot-sdks/php/configuration.md#general-configuration) setting to `false`. + +:::{note} +You need to restart your application for the changes to apply. +::: + +## Agent is not instrumenting code + +If the agent doesn't seem to be instrumenting code from your application, try the following actions. + +### Native OTLP serializer issues + +If you're experiencing issues where no spans, logs, or metrics are being sent, or if you encounter log messages like the following: + +```bash +Failed to serialize spans/logs/metrics batch... +``` + +This might be due to a failure in the native OTLP Protobuf serializer. The native serializer is activated by default for maximum performance, but in rare cases it might encounter incompatibilities with certain environments or data. To confirm whether this is the cause, try turning off the native serializer using the following environment variable: + +```bash +export ELASTIC_OTEL_NATIVE_OTLP_SERIALIZER_ENABLED=false +``` + +Restart your application and check if spans, logs, or metrics start appearing correctly. + +:::{note} +When turned off, the agent falls back to a PHP-based serializer, which has lower performance. +::: + + +### `open_basedir` PHP configuration option + +If you see a similar entry in the agent log, this indicates an incorrect `open_basedir` configuration. For more details, refer to [Limitations](opentelemetry://reference/edot-sdks/php/setup/limitations.md#open_basedir-php-configuration-option). + +``` +EDOT PHP bootstrap file (...php/bootstrap_php_part.php) is located outside of paths allowed by open_basedir ini setting. +``` + +## Collection of diagnostic information + +For a more detailed analysis of issues, you might need to collect diagnostic information. The agent allows for the automatic collection of such information: all data is saved to the file specified in the configuration. + +There are two possible ways to turn on diagnostic information: + +- By editing the `php.ini` file: Modify the `php.ini` file, or `99-elastic.ini`, to provide the path to the file where the data will be saved, For example: + + ``` + elastic_otel.debug_diagnostic_file=/tmp/php_diags_%p_%t.txt + ``` + +- By setting an environment variable. The `ELASTIC_OTEL_DEBUG_DIAGNOSTIC_FILE` environment variable must be exported or directly specified when running PHP process. For example: + + ```bash + ELASTIC_OTEL_DEBUG_DIAGNOSTIC_FILE=/tmp/php_diags_%p_%t.txt php test.php + ``` + + The provided file path must be writable by the PHP process. If there are multiple PHP processes in your system, you can specify directives in the diagnostic file name. This way, the files remain unique and won't be overwritten. + + - `%p` - In this place, the agent substitutes the process identifier. + - `%t` - In this place, the agent substitutes the UNIX timestamp. + +:::{warning} +After setting the path, remember to fully restart the process for which you are collecting diagnostic information. This might vary depending on the context, such as PHP, PHP-FPM, Apache, or PHP-CGI. Diagnostic information will be recorded after the first HTTP request is made or at the beginning of script execution for PHP-CLI. Also be aware that the information contained in the output file may include sensitive data, such as passwords, security tokens or environment variables from your system. After collecting diagnostic information, remember to disable this feature and restore the previous configuration in php.ini or the environment variable. +::: + +The following information is collected: + +- Process identifier and parent process identifier +- User identifier of the worker process +- List of loaded PHP extensions +- Result from the `phpinfo()` function +- Process memory information and memory maps (`/proc/{id}/maps` and `/proc/{id}/smaps_rollup`) +- Process status information (`/proc/{id}/status`) + +## Turn on debugging for instrumented functions + +EDOT can collect detailed diagnostics of arguments passed to instrumented functions. Use them to verify whether the data used by the instrumented application is correctly analyzed by the instrumentation code. + +To turn on debugging for instrumented function, set the following environment variable: + +```bash +ELASTIC_OTEL_DEBUG_PHP_HOOKS_ENABLED=true +``` + + +## Turn on instrumentation of all the application code + +For diagnostic purposes outside of production environments, EDOT allows instrumenting the entire code of your application. This allows tracking function calls throughout the processing of an entire request or script and provides better insight into the application's behavior. + +To turn on debugging for instrumented function, set the following environment variable: + +```bash +ELASTIC_OTEL_DEBUG_INSTRUMENT_ALL=true +``` + diff --git a/troubleshoot/ingest/opentelemetry/edot-sdks/python/index.md b/troubleshoot/ingest/opentelemetry/edot-sdks/python/index.md new file mode 100644 index 0000000000..c59f206a39 --- /dev/null +++ b/troubleshoot/ingest/opentelemetry/edot-sdks/python/index.md @@ -0,0 +1,60 @@ +--- +navigation_title: EDOT Python +description: Troubleshoot issues with the EDOT Python Agent. +applies_to: + stack: + serverless: + observability: + product: + edot_python: ga +products: + - id: cloud-serverless + - id: observability + - id: edot-sdk +--- + +# Troubleshooting the EDOT Python Agent + +Use the information on this page to troubleshoot issues using EDOT Python. + +If you need help and you're an existing Elastic customer with a support contract, create a ticket in the [Elastic Support portal](https://support.elastic.co/customers/s/login/). Other users can post in the [APM discuss forum](https://discuss.elastic.co/c/apm) or [open a GitHub issue](https://github.com/elastic/elastic-otel-node/issues). + +As a first step, review the [supported technologies](opentelemetry://reference/edot-sdks/python/supported-technologies.md) to ensure your application is supported by the agent. Are you using a Python version that EDOT Python supports? Are the versions of your dependencies in the supported version range to be instrumented? + +## General troubleshooting + +Follow these recommended actions to make sure that EDOT Python is configured correctly. + +### Debug and development modes + +Most frameworks support a debug mode. This mode is intended for non-production environments and provides detailed error messages and logging of potentially sensitive data. Turning on instrumentation in debug mode is not advised and might pose privacy and security issues in recording sensitive data. + +#### Django + +Django applications running with the Django `runserver` must use the `--noreload` parameter to be instrumented with `opentelemetry-instrument`. You also need to set the `DJANGO_SETTINGS_MODULE` environment variable pointing to the application settings module. + +#### FastAPI + +FastAPI application started with `fastapi dev` requires the reloader to be turned off with `--no-reload` to be instrumented with `opentelemetry-instrument`. + +#### Flask + +Flask applications running in debug mode require to turn off the reloader to be traced. Refer to [OpenTelemetry zero code documentation](https://opentelemetry.io/docs/zero-code/python/example/#instrumentation-while-debugging). + +## Turn off EDOT + +In the unlikely event EDOT Python causes disruptions to a production application, you can turn it off while you troubleshoot. To turn off the underlying OpenTelemetry SDK, set the `OTEL_SDK_DISABLED` environment variable to `true`. + +If only a subset of instrumentation are causing disruptions, turn them off using the `OTEL_PYTHON_DISABLED_INSTRUMENTATIONS` environment variable. The variable accepts a list of comma-separated instrumentations. Refer to [OpenTelemetry zero code documentation](https://opentelemetry.io/docs/zero-code/python/configuration/#disabling-specific-instrumentations). + +## Missing logs + +Activating the Python logging module auto-instrumentation with `OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true` calls the [logging.basicConfig](https://docs.python.org/3/library/logging.html#logging.basicConfig) method that makes your own application calls to it a no-op. The side effect of this is that you won't see your application logs in the console. If you are already shipping logs by other means, you don't need to turn this on. + +## Check stability of semantic conventions + +For some semantic conventions, like HTTP, there is a migration path, but the conversion to stable HTTP semantic conventions is not done yet for all the instrumentations. + +## Access or modification of application code + +EDOT Python is distributed as a Python package and so must be installed in the same environment as your application. Once it is available in the path, it can auto-instrument your application without changing the application code. diff --git a/troubleshoot/ingest/opentelemetry/index.md b/troubleshoot/ingest/opentelemetry/index.md new file mode 100644 index 0000000000..d5b4d272c9 --- /dev/null +++ b/troubleshoot/ingest/opentelemetry/index.md @@ -0,0 +1,18 @@ +--- +navigation_title: Elastic Distributions of OpenTelemetry (EDOT) +description: Troubleshoot EDOT issues using these guides. +applies_to: + stack: + serverless: + observability: +products: + - id: cloud-serverless + - id: observability + - id: edot-sdk +--- + +# Troubleshooting Elastic Distributions of OpenTelemetry (EDOT) +Find solutions to common issues in EDOT components and SDKs. + +- [EDOT Collector troubleshooting](/troubleshoot/ingest/opentelemetry/edot-collector/index.md) +- [EDOT SDKs troubleshooting](/troubleshoot/ingest/opentelemetry/edot-sdks/index.md) diff --git a/troubleshoot/ingest/opentelemetry/toc.yml b/troubleshoot/ingest/opentelemetry/toc.yml new file mode 100644 index 0000000000..5717cdf172 --- /dev/null +++ b/troubleshoot/ingest/opentelemetry/toc.yml @@ -0,0 +1,14 @@ +project: 'Elastic Distributions of OpenTelemetry (EDOT)' + +toc: + - file: index.md + - file: edot-collector/index.md + children: + - file: edot-collector/collector-oomkilled.md + - file: edot-sdks/index.md + children: + - file: edot-sdks/dotnet/index.md + - file: edot-sdks/java/index.md + - file: edot-sdks/nodejs/index.md + - file: edot-sdks/php/index.md + - file: edot-sdks/python/index.md \ No newline at end of file diff --git a/troubleshoot/toc.yml b/troubleshoot/toc.yml index dec9351f89..da48207bb5 100644 --- a/troubleshoot/toc.yml +++ b/troubleshoot/toc.yml @@ -143,6 +143,7 @@ toc: - file: ingest.md children: # - file: ingest/enterprise-search/crawls.md + - toc: ingest/opentelemetry - file: ingest/logstash.md children: - file: ingest/logstash/plugins.md