Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docset.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
project: 'Elastic documentation'
max_toc_depth: 2

Expand Down Expand Up @@ -37,6 +37,11 @@
- ecs-logging-ruby
- eland
- elastic-serverless-forwarder
- elastic-otel-dotnet
- elastic-otel-java
- elastic-otel-node
- elastic-otel-php
- elastic-otel-python
- elasticsearch
- elasticsearch-hadoop
- elasticsearch-java
Expand Down Expand Up @@ -78,6 +83,8 @@
ece: "Elastic Cloud Enterprise"
eck: "Elastic Cloud on Kubernetes"
edot: "Elastic Distribution of OpenTelemetry"
motlp: "Elastic Cloud Managed OTLP Endpoint"
edot-cf: "EDOT Cloud Forwarder"
serverless-full: "Elastic Cloud Serverless"
serverless-short: "Serverless"
es-serverless: "Elasticsearch Serverless"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
---
navigation_title: Collector out of memory
description: Diagnose and resolve out-of-memory issues in the EDOT Collector using Go’s Performance Profiler.
applies_to:
stack:
serverless:
observability:
product:
edot_collector: ga
products:
- id: cloud-serverless
- id: observability
- id: edot-collector
---

# Troubleshoot an out-of-memory EDOT Collector

If your EDOT Collector pods terminate with an `OOMKilled` status, this usually indicates sustained memory pressure or potentially a memory leak due to an introduced regression or a bug. You can use the Performance Profiler (`pprof`) extension to collect and analyze memory profiles, helping you identify the root cause of the issue.

## Symptoms

These symptoms typically indicate that the EDOT Collector is experiencing a memory-related failure:

- EDOT Collector pod restarts with an `OOMKilled` status in Kubernetes.
- Memory usage steadily increases before the crash.
- The Collector's logs don't show clear errors before termination.

## Resolution

Turn on runtime profiling using the `pprof` extension and then gather memory heap profiles from the affected pod:

::::::{stepper}

:::::{step} Enable `pprof` in the Collector

Edit the EDOT Collector Daemonset configuration and include the `pprof` extension:

```yaml
exporters:
...
processors:
...
receivers:
...
extensions:
pprof:

service:
extensions:
- pprof
- ...
pipelines:
metrics:
receivers: [ ... ]
processors: [ ... ]
exporters: [ ... ]
```

Restart the Collector after applying these changes. When the Daemonset is deployed again, spot the pod that is getting restarted.
:::::

:::::{step} Access the affected pod and collect a heap dump

When a pod starts exhibiting high memory usage or restarts due to OOM, run the following to enter a debug shell:

```console
kubectl debug -it <collector-pod-name> --image=ubuntu:latest
```

In the debug container:

```console
apt update
apt install -y curl
curl http://localhost:1777/debug/pprof/heap > heap.out
```
:::::

:::::{step} Copy the heap file from the pod

From your local machine, copy the heap file using:

```bash
kubectl cp <collector-pod-name>:heap.out ./heap.out -c <debug-container-name>
```
::::{note}
Replace `<debug-container-name>` with the name assigned to the debug container. Without the `-c` flag, Kubernetes will show the list of available containers.
::::
:::::

:::::{step} Convert the heap profile for analysis

You can now generate a visual representation, for example PNG:

```bash
go tool pprof -png heap.out > heap.png
```
:::::
::::::

## Best practices

To improve the effectiveness of memory diagnostics and reduce investigation time, consider the following:

- Collect multiple heap profiles over time (for example, every few minutes) to observe memory trends before the crash.

- Automate heap profile collection at intervals to observe trends over time.

## Resources

- [Go's pprof documentation](https://pkg.go.dev/net/http/pprof)
- [OpenTelemetry Collector troubleshooting documentation](https://opentelemetry.io/docs/collector/troubleshooting/#performance-profiler-pprof)
30 changes: 30 additions & 0 deletions troubleshoot/ingest/opentelemetry/edot-collector/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
navigation_title: EDOT Collector
description: Troubleshooting common issues with the EDOT Collector.
applies_to:
stack:
serverless:
observability:
products:
- id: cloud-serverless
- id: observability
- id: edot-collector
---

# Troubleshoot the EDOT Collector

Perform these checks when troubleshooting common Collector issues:

* Check logs: Review the Collector’s logs for error messages.
* Validate configuration: Use the `--dry-run` option to test configurations.
* Enable debug logging: Run the Collector with `--log-level=debug` for detailed logs.
* Check service status: Ensure the Collector is running with `systemctl status <collector-service>` (Linux) or `tasklist` (Windows).
* Test connectivity: Use `telnet <endpoint> <port>` or `curl` to verify backend availability.
* Check open ports: Run netstat `-tulnp or lsof -i` to confirm the Collector is listening.
* Monitor resource usage: Use top/htop (Linux) or Task Manager (Windows) to check CPU & memory.
* Validate exporters: Ensure exporters are properly configured and reachable.
* Verify pipelines: Use `otelctl` diagnose (if available) to check pipeline health.
* Check permissions: Ensure the Collector has the right file and network permissions.
* Review recent changes: Roll back recent config updates if the issue started after changes.

For in-depth details on troubleshooting refer to the [OpenTelemetry Collector troubleshooting documentation](https://opentelemetry.io/docs/collector/troubleshooting/).
197 changes: 197 additions & 0 deletions troubleshoot/ingest/opentelemetry/edot-sdks/dotnet/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
---
navigation_title: EDOT .NET
description: Use the information in this section to troubleshoot common problems affecting the {{edot}} .NET.
applies_to:
stack:
serverless:
observability:
product:
edot_dotnet: ga
products:
- id: cloud-serverless
- id: observability
- id: edot-sdk
---

# Troubleshooting the EDOT .NET SDK

Use the information in this section to troubleshoot common problems. As a first step, make sure your stack is compatible with the [supported technologies](opentelemetry://reference/edot-sdks/dotnet/supported-technologies.md) for EDOT .NET and the OpenTelemetry SDK.

If you have an Elastic support contract, create a ticket in the [Elastic Support portal](https://support.elastic.co/customers/s/login/). If you don't, post in the [APM discuss forum](https://discuss.elastic.co/c/apm) or [open a GitHub issue](https://github.com/elastic/elastic-otel-dotnet/issues).

## Obtain EDOT .NET diagnostic logs

For most problems, such as when you don't see data in your Elastic Observability backend, first check the EDOT .NET logs. These logs show initialization details and OpenTelemetry SDK events. If you don't see any warnings or errors in the EDOT .NET logs, switch the log level to `Trace` to investigate further.

The {{edot}} .NET includes built-in diagnostic logging. You can direct logs to a file, STDOUT, or, in common scenarios, an `ILogger` instance. EDOT .NET also observes the built-in diagnostics events from the upstream OpenTelemetry SDK and includes those in its logging output. You can collect the log output and use it to diagnose issues locally during development or when working with Elastic support channels.

## ASP.NET Core (generic host) logging integration

When you build applications based on the generic host, such as those created by the [ASP.NET Core](https://learn.microsoft.com/aspnet/core/introduction-to-aspnet-core) and [worker service](https://learn.microsoft.com/dotnet/core/extensions/workers) templates, the {{edot}} .NET will try to automatically register with the built-in logging components when you use the `IHostApplicationBuilder.AddElasticOpenTelemetry` extension method to register EDOT .NET.

```csharp
var builder = WebApplication.CreateBuilder(args);
builder.AddElasticOpenTelemetry();
```

In this scenario, EDOT .NET tries to access an available `ILoggerFactory` and create an `ILogger`, logging to the event category `Elastic.OpenTelemetry`. EDOT .NET will register this as the additional logger for its diagnostics unless you have already configured a user-provided `ILogger`. This ensures that EDOT .NET and OpenTelemetry SDK logs are written for your application's configured logging providers. In ASP.NET Core, this includes the console logging provider and results in logs such as the following:

```
info: Elastic.OpenTelemetry[0]
Elastic Distribution of OpenTelemetry (EDOT) .NET: 1.0.0
info: Elastic.OpenTelemetry[0]
EDOT log file: <disabled>
info: Microsoft.Hosting.Lifetime[14]
Now listening on: https://localhost:7295
info: Microsoft.Hosting.Lifetime[14]
Now listening on: http://localhost:5247
info: Microsoft.Hosting.Lifetime[0]
Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
Hosting environment: Development
```

In the preceding log output, informational level logging is enabled as the default for this application. You can control the output by configuring the log levels.

### Configuring the log level

You can [configure](https://learn.microsoft.com/en-us/dotnet/core/extensions/logging?tabs=command-line#configure-logging) logs sent to the integrated `Microsoft.Extensions.Logging` library in several ways. A common choice is to use the `appsettings.json` file to configure log-level filters for specific categories.

```json
{
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft.AspNetCore": "Warning",
"Elastic.OpenTelemetry": "Warning"
}
},
"AllowedHosts": "*"
}
```

In the preceding code, you have filtered `Elastic.OpenTelemetry` to only emit log entries when they have the `Warning` log level or a higher severity. This overrides the `Default` configuration of `Information`.

## Enable global file logging

Integrated logging is helpful because it requires little to no setup. The logging infrastructure is not present by default in some application types, such as console applications. EDOT .NET also offers a global file logging feature, which is the easiest way for you to get diagnostics and debug information. You must enable file logging when you work with Elastic support, as trace logs will be requested.

Specify at least one of the following environment variables to make sure that EDOT .NET logs into a file.

`OTEL_LOG_LEVEL` _(optional)_:
Set the log level at which the profiler should log. Valid values are

* trace
* debug
* information
* warning
* error
* none

The default value is `information`. More verbose log levels like `trace` and `debug` can affect the runtime performance of profiler auto instrumentation, so use them _only_ for diagnostics purposes.

:::{note}
If you don't explicitly set `ELASTIC_OTEL_LOG_TARGETS` to include `file`, global file logging will only be enabled when you configure it with `trace` or `debug`.
:::

`OTEL_DOTNET_AUTO_LOG_DIRECTORY` _(optional)_:
Set the directory in which to write log files. If you don't set this, the default is:

* `%USERPROFILE%\AppData\Roaming\elastic\elastic-otel-dotnet` on Windows
* `/var/log/elastic/elastic-otel-dotnet` on Linux
* `~/Library/Application Support/elastic/elastic-otel-dotnet` on OSX

> ::::{important}
> Make sure the user account under which the profiler process runs has permission to write to the destination log directory. Specifically, when you run on IIS, ensure that the [AppPool identity](https://learn.microsoft.com/en-us/iis/manage/configuring-security/application-pool-identities) has write permissions in the target directory.
> ::::

`ELASTIC_OTEL_LOG_TARGETS` _(optional)_:
A semi-colon separated list of targets for profiler logs. Valid values are

* file
* stdout
* none

The default value is `file` if you set `OTEL_DOTNET_AUTO_LOG_DIRECTORY` or set `OTEL_LOG_LEVEL` to `trace` or `debug`.

## Advanced troubleshooting

### Diagnosing initialization or bootstrap issues

If EDOT for .NET fails before fully bootstrapping its internal components, it won't generate a log file. In such circumstances, you can provide an additional logger for diagnostic purposes. Alternatively, you can enable the `STDOUT` log target.

#### Providing an additional application logger

You can provide an additional `ILogger` that EDOT .NET will use to log pre-bootstrap events by creating an instance of `ElasticOpenTelemetryOptions`.

```csharp
using Elastic.OpenTelemetry;
using Microsoft.Extensions.Logging;
using OpenTelemetry;

using ILoggerFactory loggerFactory = LoggerFactory.Create(static builder =>
{
builder
.AddFilter("Elastic.OpenTelemetry", LogLevel.Trace)
.AddConsole();
});

ILogger logger = loggerFactory.CreateLogger("EDOT");

var options = new ElasticOpenTelemetryOptions
{
AdditionalLogger = logger
};

using var sdk = OpenTelemetrySdk.Create(builder => builder
.WithElasticDefaults(options));
```

This example adds the console logging provider, but you can include any provider here. To use this sample code, add a dependency on the `Microsoft.Extensions.Logging.Console` [NuGet package](https://www.nuget.org/packages/microsoft.extensions.logging.console).

You create and configure an `ILoggerFactory`. In this example, you configure the `Elastic.OpenTelemetry` category to capture trace logs, which is the most verbose option. This is the best choice when you diagnose initialization issues.

You use the `ILoggerFactory` to create an `ILogger`, which you then assign to the `ElasticOpenTelemetryOptions.AdditionalLogger` property. Once you pass the `ElasticOpenTelemetryOptions` into the `WithElasticDefaults` method, the provided logger can capture bootstrap logs.

To simplify the preceding code, you can also configure the `ElasticOpenTelemetryOptions` with an `ILoggerFactory` instance that EDOT .NET can use to create its own logger.

```csharp
using var loggerFactory = LoggerFactory.Create(static builder =>
{
builder
.AddFilter("Elastic.OpenTelemetry", LogLevel.Debug)
.AddConsole();
});

var options = new ElasticOpenTelemetryOptions
{
AdditionalLoggerFactory = loggerFactory
};

using var sdk = OpenTelemetrySdk.Create(builder => builder
.WithElasticDefaults(options));
```

## Known issues

The following known issues affect EDOT .NET.

### Missing log records

The upstream SDK currently does not [comply with the spec](https://github.com/open-telemetry/opentelemetry-dotnet/issues/4324) regarding the deduplication of attributes when exporting log records. When you create a log within multiple scopes, each scope may store information using the same logical key. In this situation, the exported data will have duplicated attributes.

You are most likely to see this when you log in the scope of a request and enable the `OpenTelemetryLoggerOptions.IncludeScopes` option. ASP.NET Core adds the `RequestId` to multiple scopes. We recommend that you don't enable `IncludeScopes` until the SDK fixes this. When you use the EDOT Collector or the [{{motlp}}](opentelemetry://reference/motlp.md) in serverless, non-compliant log records will fail to be ingested.

EDOT .NET currently emits a warning if it detects that you use `IncludeScopes` in ASP.NET Core scenarios.

This can also happen even when you set `IncludeScopes` to false. The following code will also result in duplicate attributes and the potential for lost log records.

```csharp
Logger.LogInformation("Eat your {fruit} {fruit} {fruit}!", "apple", "banana", "mango");
```

To avoid this scenario, make sure each placeholder uses a unique name. For example:

```csharp
Logger.LogInformation("Eat your {fruit1} {fruit2} {fruit3}!", "apple", "banana", "mango");
```
22 changes: 22 additions & 0 deletions troubleshoot/ingest/opentelemetry/edot-sdks/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
navigation_title: EDOT SDKs
description: Troubleshoot issues with the EDOT SDKs using these guides.
applies_to:
stack:
serverless:
observability:
products:
- id: cloud-serverless
- id: observability
- id: edot-sdk
---

# Troubleshooting the EDOT SDKs

Find solutions to common issues with EDOT SDKs.

- [.NET](/troubleshoot/ingest/opentelemetry/edot-sdks/dotnet/index.md)
- [Java](/troubleshoot/ingest/opentelemetry/edot-sdks/java/index.md)
- [Node.js](/troubleshoot/ingest/opentelemetry/edot-sdks/nodejs/index.md)
- [PHP](/troubleshoot/ingest/opentelemetry/edot-sdks/php/index.md)
- [Python](/troubleshoot/ingest/opentelemetry/edot-sdks/python/index.md)
Loading
Loading