-
Notifications
You must be signed in to change notification settings - Fork 163
Move EDOT Troubleshooting docs #2035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+900
−0
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
112 changes: 112 additions & 0 deletions
112
troubleshoot/ingest/opentelemetry/edot-collector/collector-oomkilled.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
--- | ||
navigation_title: Collector out of memory | ||
description: Diagnose and resolve out-of-memory issues in the EDOT Collector using Go’s Performance Profiler. | ||
applies_to: | ||
stack: | ||
serverless: | ||
observability: | ||
product: | ||
edot_collector: ga | ||
products: | ||
- id: cloud-serverless | ||
- id: observability | ||
- id: edot-collector | ||
--- | ||
|
||
# Troubleshoot an out-of-memory EDOT Collector | ||
|
||
If your EDOT Collector pods terminate with an `OOMKilled` status, this usually indicates sustained memory pressure or potentially a memory leak due to an introduced regression or a bug. You can use the Performance Profiler (`pprof`) extension to collect and analyze memory profiles, helping you identify the root cause of the issue. | ||
|
||
## Symptoms | ||
|
||
These symptoms typically indicate that the EDOT Collector is experiencing a memory-related failure: | ||
|
||
- EDOT Collector pod restarts with an `OOMKilled` status in Kubernetes. | ||
- Memory usage steadily increases before the crash. | ||
- The Collector's logs don't show clear errors before termination. | ||
|
||
## Resolution | ||
|
||
Turn on runtime profiling using the `pprof` extension and then gather memory heap profiles from the affected pod: | ||
|
||
::::::{stepper} | ||
|
||
:::::{step} Enable `pprof` in the Collector | ||
|
||
Edit the EDOT Collector Daemonset configuration and include the `pprof` extension: | ||
|
||
```yaml | ||
exporters: | ||
... | ||
processors: | ||
... | ||
receivers: | ||
... | ||
extensions: | ||
pprof: | ||
|
||
service: | ||
extensions: | ||
- pprof | ||
- ... | ||
pipelines: | ||
metrics: | ||
receivers: [ ... ] | ||
processors: [ ... ] | ||
exporters: [ ... ] | ||
``` | ||
|
||
Restart the Collector after applying these changes. When the Daemonset is deployed again, spot the pod that is getting restarted. | ||
::::: | ||
|
||
:::::{step} Access the affected pod and collect a heap dump | ||
|
||
When a pod starts exhibiting high memory usage or restarts due to OOM, run the following to enter a debug shell: | ||
|
||
```console | ||
kubectl debug -it <collector-pod-name> --image=ubuntu:latest | ||
``` | ||
|
||
In the debug container: | ||
|
||
```console | ||
apt update | ||
apt install -y curl | ||
curl http://localhost:1777/debug/pprof/heap > heap.out | ||
``` | ||
::::: | ||
|
||
:::::{step} Copy the heap file from the pod | ||
|
||
From your local machine, copy the heap file using: | ||
|
||
```bash | ||
kubectl cp <collector-pod-name>:heap.out ./heap.out -c <debug-container-name> | ||
``` | ||
::::{note} | ||
Replace `<debug-container-name>` with the name assigned to the debug container. Without the `-c` flag, Kubernetes will show the list of available containers. | ||
:::: | ||
::::: | ||
|
||
:::::{step} Convert the heap profile for analysis | ||
|
||
You can now generate a visual representation, for example PNG: | ||
|
||
```bash | ||
go tool pprof -png heap.out > heap.png | ||
``` | ||
::::: | ||
:::::: | ||
|
||
## Best practices | ||
|
||
To improve the effectiveness of memory diagnostics and reduce investigation time, consider the following: | ||
|
||
- Collect multiple heap profiles over time (for example, every few minutes) to observe memory trends before the crash. | ||
|
||
- Automate heap profile collection at intervals to observe trends over time. | ||
|
||
## Resources | ||
|
||
- [Go's pprof documentation](https://pkg.go.dev/net/http/pprof) | ||
- [OpenTelemetry Collector troubleshooting documentation](https://opentelemetry.io/docs/collector/troubleshooting/#performance-profiler-pprof) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
--- | ||
navigation_title: EDOT Collector | ||
description: Troubleshooting common issues with the EDOT Collector. | ||
applies_to: | ||
stack: | ||
serverless: | ||
observability: | ||
products: | ||
- id: cloud-serverless | ||
- id: observability | ||
- id: edot-collector | ||
--- | ||
|
||
# Troubleshoot the EDOT Collector | ||
|
||
Perform these checks when troubleshooting common Collector issues: | ||
|
||
* Check logs: Review the Collector’s logs for error messages. | ||
* Validate configuration: Use the `--dry-run` option to test configurations. | ||
* Enable debug logging: Run the Collector with `--log-level=debug` for detailed logs. | ||
* Check service status: Ensure the Collector is running with `systemctl status <collector-service>` (Linux) or `tasklist` (Windows). | ||
* Test connectivity: Use `telnet <endpoint> <port>` or `curl` to verify backend availability. | ||
* Check open ports: Run netstat `-tulnp or lsof -i` to confirm the Collector is listening. | ||
* Monitor resource usage: Use top/htop (Linux) or Task Manager (Windows) to check CPU & memory. | ||
* Validate exporters: Ensure exporters are properly configured and reachable. | ||
* Verify pipelines: Use `otelctl` diagnose (if available) to check pipeline health. | ||
* Check permissions: Ensure the Collector has the right file and network permissions. | ||
* Review recent changes: Roll back recent config updates if the issue started after changes. | ||
|
||
For in-depth details on troubleshooting refer to the [OpenTelemetry Collector troubleshooting documentation](https://opentelemetry.io/docs/collector/troubleshooting/). |
197 changes: 197 additions & 0 deletions
197
troubleshoot/ingest/opentelemetry/edot-sdks/dotnet/index.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,197 @@ | ||
--- | ||
navigation_title: EDOT .NET | ||
description: Use the information in this section to troubleshoot common problems affecting the {{edot}} .NET. | ||
applies_to: | ||
stack: | ||
serverless: | ||
observability: | ||
product: | ||
edot_dotnet: ga | ||
products: | ||
- id: cloud-serverless | ||
- id: observability | ||
- id: edot-sdk | ||
--- | ||
|
||
# Troubleshooting the EDOT .NET SDK | ||
|
||
Use the information in this section to troubleshoot common problems. As a first step, make sure your stack is compatible with the [supported technologies](opentelemetry://reference/edot-sdks/dotnet/supported-technologies.md) for EDOT .NET and the OpenTelemetry SDK. | ||
|
||
If you have an Elastic support contract, create a ticket in the [Elastic Support portal](https://support.elastic.co/customers/s/login/). If you don't, post in the [APM discuss forum](https://discuss.elastic.co/c/apm) or [open a GitHub issue](https://github.com/elastic/elastic-otel-dotnet/issues). | ||
|
||
## Obtain EDOT .NET diagnostic logs | ||
|
||
For most problems, such as when you don't see data in your Elastic Observability backend, first check the EDOT .NET logs. These logs show initialization details and OpenTelemetry SDK events. If you don't see any warnings or errors in the EDOT .NET logs, switch the log level to `Trace` to investigate further. | ||
|
||
The {{edot}} .NET includes built-in diagnostic logging. You can direct logs to a file, STDOUT, or, in common scenarios, an `ILogger` instance. EDOT .NET also observes the built-in diagnostics events from the upstream OpenTelemetry SDK and includes those in its logging output. You can collect the log output and use it to diagnose issues locally during development or when working with Elastic support channels. | ||
|
||
## ASP.NET Core (generic host) logging integration | ||
|
||
When you build applications based on the generic host, such as those created by the [ASP.NET Core](https://learn.microsoft.com/aspnet/core/introduction-to-aspnet-core) and [worker service](https://learn.microsoft.com/dotnet/core/extensions/workers) templates, the {{edot}} .NET will try to automatically register with the built-in logging components when you use the `IHostApplicationBuilder.AddElasticOpenTelemetry` extension method to register EDOT .NET. | ||
|
||
```csharp | ||
var builder = WebApplication.CreateBuilder(args); | ||
builder.AddElasticOpenTelemetry(); | ||
``` | ||
|
||
In this scenario, EDOT .NET tries to access an available `ILoggerFactory` and create an `ILogger`, logging to the event category `Elastic.OpenTelemetry`. EDOT .NET will register this as the additional logger for its diagnostics unless you have already configured a user-provided `ILogger`. This ensures that EDOT .NET and OpenTelemetry SDK logs are written for your application's configured logging providers. In ASP.NET Core, this includes the console logging provider and results in logs such as the following: | ||
|
||
``` | ||
info: Elastic.OpenTelemetry[0] | ||
Elastic Distribution of OpenTelemetry (EDOT) .NET: 1.0.0 | ||
info: Elastic.OpenTelemetry[0] | ||
EDOT log file: <disabled> | ||
info: Microsoft.Hosting.Lifetime[14] | ||
Now listening on: https://localhost:7295 | ||
info: Microsoft.Hosting.Lifetime[14] | ||
Now listening on: http://localhost:5247 | ||
info: Microsoft.Hosting.Lifetime[0] | ||
Application started. Press Ctrl+C to shut down. | ||
info: Microsoft.Hosting.Lifetime[0] | ||
Hosting environment: Development | ||
``` | ||
|
||
In the preceding log output, informational level logging is enabled as the default for this application. You can control the output by configuring the log levels. | ||
|
||
### Configuring the log level | ||
|
||
You can [configure](https://learn.microsoft.com/en-us/dotnet/core/extensions/logging?tabs=command-line#configure-logging) logs sent to the integrated `Microsoft.Extensions.Logging` library in several ways. A common choice is to use the `appsettings.json` file to configure log-level filters for specific categories. | ||
|
||
```json | ||
{ | ||
"Logging": { | ||
"LogLevel": { | ||
"Default": "Information", | ||
"Microsoft.AspNetCore": "Warning", | ||
"Elastic.OpenTelemetry": "Warning" | ||
} | ||
}, | ||
"AllowedHosts": "*" | ||
} | ||
``` | ||
|
||
In the preceding code, you have filtered `Elastic.OpenTelemetry` to only emit log entries when they have the `Warning` log level or a higher severity. This overrides the `Default` configuration of `Information`. | ||
|
||
## Enable global file logging | ||
|
||
Integrated logging is helpful because it requires little to no setup. The logging infrastructure is not present by default in some application types, such as console applications. EDOT .NET also offers a global file logging feature, which is the easiest way for you to get diagnostics and debug information. You must enable file logging when you work with Elastic support, as trace logs will be requested. | ||
|
||
Specify at least one of the following environment variables to make sure that EDOT .NET logs into a file. | ||
|
||
`OTEL_LOG_LEVEL` _(optional)_: | ||
Set the log level at which the profiler should log. Valid values are | ||
|
||
* trace | ||
* debug | ||
* information | ||
* warning | ||
* error | ||
* none | ||
|
||
The default value is `information`. More verbose log levels like `trace` and `debug` can affect the runtime performance of profiler auto instrumentation, so use them _only_ for diagnostics purposes. | ||
|
||
:::{note} | ||
If you don't explicitly set `ELASTIC_OTEL_LOG_TARGETS` to include `file`, global file logging will only be enabled when you configure it with `trace` or `debug`. | ||
::: | ||
|
||
`OTEL_DOTNET_AUTO_LOG_DIRECTORY` _(optional)_: | ||
Set the directory in which to write log files. If you don't set this, the default is: | ||
|
||
* `%USERPROFILE%\AppData\Roaming\elastic\elastic-otel-dotnet` on Windows | ||
* `/var/log/elastic/elastic-otel-dotnet` on Linux | ||
* `~/Library/Application Support/elastic/elastic-otel-dotnet` on OSX | ||
|
||
> ::::{important} | ||
> Make sure the user account under which the profiler process runs has permission to write to the destination log directory. Specifically, when you run on IIS, ensure that the [AppPool identity](https://learn.microsoft.com/en-us/iis/manage/configuring-security/application-pool-identities) has write permissions in the target directory. | ||
> :::: | ||
|
||
`ELASTIC_OTEL_LOG_TARGETS` _(optional)_: | ||
A semi-colon separated list of targets for profiler logs. Valid values are | ||
|
||
* file | ||
* stdout | ||
* none | ||
|
||
The default value is `file` if you set `OTEL_DOTNET_AUTO_LOG_DIRECTORY` or set `OTEL_LOG_LEVEL` to `trace` or `debug`. | ||
|
||
## Advanced troubleshooting | ||
|
||
### Diagnosing initialization or bootstrap issues | ||
|
||
If EDOT for .NET fails before fully bootstrapping its internal components, it won't generate a log file. In such circumstances, you can provide an additional logger for diagnostic purposes. Alternatively, you can enable the `STDOUT` log target. | ||
|
||
#### Providing an additional application logger | ||
|
||
You can provide an additional `ILogger` that EDOT .NET will use to log pre-bootstrap events by creating an instance of `ElasticOpenTelemetryOptions`. | ||
|
||
```csharp | ||
using Elastic.OpenTelemetry; | ||
using Microsoft.Extensions.Logging; | ||
using OpenTelemetry; | ||
|
||
using ILoggerFactory loggerFactory = LoggerFactory.Create(static builder => | ||
{ | ||
builder | ||
.AddFilter("Elastic.OpenTelemetry", LogLevel.Trace) | ||
.AddConsole(); | ||
}); | ||
|
||
ILogger logger = loggerFactory.CreateLogger("EDOT"); | ||
|
||
var options = new ElasticOpenTelemetryOptions | ||
{ | ||
AdditionalLogger = logger | ||
}; | ||
|
||
using var sdk = OpenTelemetrySdk.Create(builder => builder | ||
.WithElasticDefaults(options)); | ||
``` | ||
|
||
This example adds the console logging provider, but you can include any provider here. To use this sample code, add a dependency on the `Microsoft.Extensions.Logging.Console` [NuGet package](https://www.nuget.org/packages/microsoft.extensions.logging.console). | ||
|
||
You create and configure an `ILoggerFactory`. In this example, you configure the `Elastic.OpenTelemetry` category to capture trace logs, which is the most verbose option. This is the best choice when you diagnose initialization issues. | ||
|
||
You use the `ILoggerFactory` to create an `ILogger`, which you then assign to the `ElasticOpenTelemetryOptions.AdditionalLogger` property. Once you pass the `ElasticOpenTelemetryOptions` into the `WithElasticDefaults` method, the provided logger can capture bootstrap logs. | ||
|
||
To simplify the preceding code, you can also configure the `ElasticOpenTelemetryOptions` with an `ILoggerFactory` instance that EDOT .NET can use to create its own logger. | ||
|
||
```csharp | ||
using var loggerFactory = LoggerFactory.Create(static builder => | ||
{ | ||
builder | ||
.AddFilter("Elastic.OpenTelemetry", LogLevel.Debug) | ||
.AddConsole(); | ||
}); | ||
|
||
var options = new ElasticOpenTelemetryOptions | ||
{ | ||
AdditionalLoggerFactory = loggerFactory | ||
}; | ||
|
||
using var sdk = OpenTelemetrySdk.Create(builder => builder | ||
.WithElasticDefaults(options)); | ||
``` | ||
|
||
## Known issues | ||
|
||
The following known issues affect EDOT .NET. | ||
|
||
### Missing log records | ||
|
||
The upstream SDK currently does not [comply with the spec](https://github.com/open-telemetry/opentelemetry-dotnet/issues/4324) regarding the deduplication of attributes when exporting log records. When you create a log within multiple scopes, each scope may store information using the same logical key. In this situation, the exported data will have duplicated attributes. | ||
|
||
You are most likely to see this when you log in the scope of a request and enable the `OpenTelemetryLoggerOptions.IncludeScopes` option. ASP.NET Core adds the `RequestId` to multiple scopes. We recommend that you don't enable `IncludeScopes` until the SDK fixes this. When you use the EDOT Collector or the [{{motlp}}](opentelemetry://reference/motlp.md) in serverless, non-compliant log records will fail to be ingested. | ||
|
||
EDOT .NET currently emits a warning if it detects that you use `IncludeScopes` in ASP.NET Core scenarios. | ||
|
||
This can also happen even when you set `IncludeScopes` to false. The following code will also result in duplicate attributes and the potential for lost log records. | ||
|
||
```csharp | ||
Logger.LogInformation("Eat your {fruit} {fruit} {fruit}!", "apple", "banana", "mango"); | ||
``` | ||
|
||
To avoid this scenario, make sure each placeholder uses a unique name. For example: | ||
|
||
```csharp | ||
Logger.LogInformation("Eat your {fruit1} {fruit2} {fruit3}!", "apple", "banana", "mango"); | ||
``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
--- | ||
navigation_title: EDOT SDKs | ||
description: Troubleshoot issues with the EDOT SDKs using these guides. | ||
applies_to: | ||
stack: | ||
serverless: | ||
observability: | ||
products: | ||
- id: cloud-serverless | ||
- id: observability | ||
- id: edot-sdk | ||
--- | ||
|
||
# Troubleshooting the EDOT SDKs | ||
|
||
Find solutions to common issues with EDOT SDKs. | ||
|
||
- [.NET](/troubleshoot/ingest/opentelemetry/edot-sdks/dotnet/index.md) | ||
- [Java](/troubleshoot/ingest/opentelemetry/edot-sdks/java/index.md) | ||
- [Node.js](/troubleshoot/ingest/opentelemetry/edot-sdks/nodejs/index.md) | ||
- [PHP](/troubleshoot/ingest/opentelemetry/edot-sdks/php/index.md) | ||
- [Python](/troubleshoot/ingest/opentelemetry/edot-sdks/python/index.md) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.