|
| 1 | +--- |
| 2 | +title: Troubleshoot OpenTelemetry Collector Issues |
| 3 | +linkTitle: 10. Troubleshoot OpenTelemetry Collector Issues |
| 4 | +weight: 10 |
| 5 | +time: 20 minutes |
| 6 | +--- |
| 7 | + |
| 8 | +In the previous section, we added the debug exporter to the collector configuration, |
| 9 | +and made it part of the pipeline for traces and logs. We see the debug output |
| 10 | +written to the agent collector logs as expected. |
| 11 | + |
| 12 | +However, traces are no longer sent to o11y cloud. Let's figure out why and fix it. |
| 13 | + |
| 14 | +## Review the Collector Config |
| 15 | + |
| 16 | +Whenever a change to the collector config is made via a `values.yaml` file, it's helpful |
| 17 | +to review the actual configuration applied to the collector by looking at the config map: |
| 18 | + |
| 19 | +``` bash |
| 20 | +kubectl describe cm splunk-otel-collector-otel-agent |
| 21 | +``` |
| 22 | + |
| 23 | +Let's review the traces pipeline in the agent collector config. It should look |
| 24 | +like this: |
| 25 | + |
| 26 | +``` yaml |
| 27 | + pipelines: |
| 28 | + ... |
| 29 | + traces: |
| 30 | + exporters: |
| 31 | + - debug |
| 32 | + processors: |
| 33 | + - memory_limiter |
| 34 | + - k8sattributes |
| 35 | + - batch |
| 36 | + - resourcedetection |
| 37 | + - resource |
| 38 | + - resource/add_environment |
| 39 | + receivers: |
| 40 | + - otlp |
| 41 | + - jaeger |
| 42 | + - smartagent/signalfx-forwarder |
| 43 | + - zipkin |
| 44 | +``` |
| 45 | +
|
| 46 | +Do you see the problem? Only the debug exporter is included in the traces pipeline. |
| 47 | +The `otlphttp` and `signalfx` exporters that were present in the configuration previously are gone. |
| 48 | +This is why we no longer see traces in o11y cloud. |
| 49 | + |
| 50 | +> How did we know what specific exporters were included before? To find out, |
| 51 | +> we could have reverted our earlier customizations and then checked the config |
| 52 | +> map to see what was in the traces pipeline originally. Alternatively, we can refer |
| 53 | +> to the examples in the [GitHub repo for splunk-otel-collector-chart](https://github.com/signalfx/splunk-otel-collector-chart/blob/main/examples/default/rendered_manifests/configmap-agent.yaml) |
| 54 | +> which shows us what default agent config is used by the Helm chart. |
| 55 | + |
| 56 | +## How did the otlphttp and signalfx exporters get removed? |
| 57 | + |
| 58 | +Let's review the customizations we added to the `values.yaml` file: |
| 59 | + |
| 60 | +``` yaml |
| 61 | +... |
| 62 | +agent: |
| 63 | + config: |
| 64 | + exporters: |
| 65 | + debug: |
| 66 | + verbosity: detailed |
| 67 | + service: |
| 68 | + pipelines: |
| 69 | + traces: |
| 70 | + exporters: |
| 71 | + - debug |
| 72 | + logs: |
| 73 | + exporters: |
| 74 | + - debug |
| 75 | + processors: |
| 76 | + - memory_limiter |
| 77 | + - batch |
| 78 | + - resourcedetection |
| 79 | + - resource |
| 80 | + receivers: |
| 81 | + - otlp |
| 82 | +``` |
| 83 | + |
| 84 | +When we applied the `values.yaml` file to the collector using `helm upgrade`, the |
| 85 | +custom configuration got merged with the previous collector configuration. |
| 86 | +When this happens, the sections of the `yaml` configuration that contain lists, |
| 87 | +such as the list of exporters in the pipeline section, get replaced with what we |
| 88 | +included in the `values.yaml` file (which was only the debug exporter). |
| 89 | + |
| 90 | +## Let's Fix the Issue |
| 91 | + |
| 92 | +So when customizing an existing pipeline, we need to fully redefine that part of the configuration. |
| 93 | +Our `values.yaml` file should thus be updated as follows: |
| 94 | + |
| 95 | +``` yaml |
| 96 | +splunkObservability: |
| 97 | + realm: us1 |
| 98 | + accessToken: *** |
| 99 | + infrastructureMonitoringEventsEnabled: true |
| 100 | +clusterName: $INSTANCE-cluster |
| 101 | +environment: otel-$INSTANCE |
| 102 | +agent: |
| 103 | + config: |
| 104 | + exporters: |
| 105 | + debug: |
| 106 | + verbosity: detailed |
| 107 | + service: |
| 108 | + pipelines: |
| 109 | + traces: |
| 110 | + exporters: |
| 111 | + - otlphttp |
| 112 | + - signalfx |
| 113 | + - debug |
| 114 | + logs: |
| 115 | + exporters: |
| 116 | + - debug |
| 117 | + processors: |
| 118 | + - memory_limiter |
| 119 | + - batch |
| 120 | + - resourcedetection |
| 121 | + - resource |
| 122 | + receivers: |
| 123 | + - otlp |
| 124 | +``` |
| 125 | + |
| 126 | +Let's apply the changes: |
| 127 | + |
| 128 | +``` bash |
| 129 | +helm upgrade splunk-otel-collector -f values.yaml \ |
| 130 | +splunk-otel-collector-chart/splunk-otel-collector |
| 131 | +``` |
| 132 | + |
| 133 | +And then check the agent config map: |
| 134 | + |
| 135 | +``` bash |
| 136 | +kubectl describe cm splunk-otel-collector-otel-agent |
| 137 | +``` |
| 138 | + |
| 139 | +This time, we should see a fully defined exporters pipeline for traces: |
| 140 | + |
| 141 | +``` bash |
| 142 | + pipelines: |
| 143 | + ... |
| 144 | + traces: |
| 145 | + exporters: |
| 146 | + - otlphttp |
| 147 | + - signalfx |
| 148 | + - debug |
| 149 | + processors: |
| 150 | + ... |
| 151 | +``` |
| 152 | + |
| 153 | +## Reviewing the Log Output |
| 154 | + |
| 155 | +The **Splunk Distribution of OpenTelemetry .NET** automatically exports logs enriched with tracing context |
| 156 | +from applications that use `Microsoft.Extensions.Logging` for logging (which our sample app does). |
| 157 | + |
| 158 | +Application logs are enriched with tracing metadata and then exported to a local instance of |
| 159 | +the OpenTelemetry Collector in OTLP format. |
| 160 | + |
| 161 | +Let's take a closer look at the logs that were captured by the debug exporter to see if that's happening. |
| 162 | +To tail the collector logs, we can use the following command: |
| 163 | + |
| 164 | +``` bash |
| 165 | +kubectl logs -l component=otel-collector-agent -f |
| 166 | +``` |
| 167 | + |
| 168 | +Once we're tailing the logs, we can use curl to generate some more traffic. Then we should see |
| 169 | +something like the following: |
| 170 | + |
| 171 | +```` |
| 172 | +2024-12-20T21:56:30.858Z info Logs {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1} |
| 173 | +2024-12-20T21:56:30.858Z info ResourceLog #0 |
| 174 | +Resource SchemaURL: https://opentelemetry.io/schemas/1.6.1 |
| 175 | +Resource attributes: |
| 176 | + -> splunk.distro.version: Str(1.8.0) |
| 177 | + -> telemetry.distro.name: Str(splunk-otel-dotnet) |
| 178 | + -> telemetry.distro.version: Str(1.8.0) |
| 179 | + -> os.type: Str(linux) |
| 180 | + -> os.description: Str(Debian GNU/Linux 12 (bookworm)) |
| 181 | + -> os.build_id: Str(6.8.0-1021-aws) |
| 182 | + -> os.name: Str(Debian GNU/Linux) |
| 183 | + -> os.version: Str(12) |
| 184 | + -> host.name: Str(derek-1) |
| 185 | + -> process.owner: Str(app) |
| 186 | + -> process.pid: Int(1) |
| 187 | + -> process.runtime.description: Str(.NET 8.0.11) |
| 188 | + -> process.runtime.name: Str(.NET) |
| 189 | + -> process.runtime.version: Str(8.0.11) |
| 190 | + -> container.id: Str(5bee5b8f56f4b29f230ffdd183d0367c050872fefd9049822c1ab2aa662ba242) |
| 191 | + -> telemetry.sdk.name: Str(opentelemetry) |
| 192 | + -> telemetry.sdk.language: Str(dotnet) |
| 193 | + -> telemetry.sdk.version: Str(1.9.0) |
| 194 | + -> service.name: Str(helloworld) |
| 195 | + -> deployment.environment: Str(otel-derek-1) |
| 196 | + -> k8s.node.name: Str(derek-1) |
| 197 | + -> k8s.cluster.name: Str(derek-1-cluster) |
| 198 | +ScopeLogs #0 |
| 199 | +ScopeLogs SchemaURL: |
| 200 | +InstrumentationScope HelloWorldController |
| 201 | +LogRecord #0 |
| 202 | +ObservedTimestamp: 2024-12-20 21:56:28.486804 +0000 UTC |
| 203 | +Timestamp: 2024-12-20 21:56:28.486804 +0000 UTC |
| 204 | +SeverityText: Information |
| 205 | +SeverityNumber: Info(9) |
| 206 | +Body: Str(/hello endpoint invoked by {name}) |
| 207 | +Attributes: |
| 208 | + -> name: Str(Kubernetes) |
| 209 | +Trace ID: 78db97a12b942c0252d7438d6b045447 |
| 210 | +Span ID: 5e9158aa42f96db3 |
| 211 | +Flags: 1 |
| 212 | + {"kind": "exporter", "data_type": "logs", "name": "debug"} |
| 213 | +```` |
| 214 | + |
| 215 | +In this example, we can see that the Trace ID and Span ID were automatically written to the log output |
| 216 | +by the OpenTelemetry .NET instrumentation. This allows us to correlate logs with traces in |
| 217 | +Splunk Observability Cloud. |
| 218 | + |
| 219 | +You might remember though that if we deploy the OpenTelemetry collector in a K8s cluster using Helm, |
| 220 | +and we include the log collection option, then the OpenTelemetry collector will use the File Log receiver |
| 221 | +to automatically capture any container logs. |
| 222 | + |
| 223 | +This would result in duplicate logs being captured for our application. How do we avoid this? |
| 224 | + |
| 225 | +## Avoiding Duplicate Logs in K8s |
| 226 | + |
| 227 | +To avoid capturing duplicate logs, we have one of two options: |
| 228 | + |
| 229 | +1. We can set the `OTEL_LOGS_EXPORTER` environment variable to `none`, to tell the Splunk Distribution of OpenTelemetry .NET to avoid exporting logs to the collector using OTLP. |
| 230 | +2. We can manage log ingestion using annotations. |
| 231 | + |
| 232 | +### Option 1 |
| 233 | + |
| 234 | +Setting the `OTEL_LOGS_EXPORTER` environment variable to `none` is straightforward. However, the Trace ID and Span ID are not written to the stdout logs generated by the application, |
| 235 | +which would prevent us from correlating logs with traces. |
| 236 | + |
| 237 | +To resolve this, we could define a custom logger, such as the example defined in |
| 238 | +`/home/splunk/workshop/docker-k8s-otel/helloworld/SplunkTelemetryConfigurator.cs`. |
| 239 | + |
| 240 | +We could include this in our application by updating the `Program.cs` file as follows: |
| 241 | + |
| 242 | +``` cs |
| 243 | +using SplunkTelemetry; |
| 244 | +using Microsoft.Extensions.Logging.Console; |
| 245 | +
|
| 246 | +var builder = WebApplication.CreateBuilder(args); |
| 247 | +
|
| 248 | +builder.Services.AddControllers(); |
| 249 | +
|
| 250 | +SplunkTelemetryConfigurator.ConfigureLogger(builder.Logging); |
| 251 | +
|
| 252 | +var app = builder.Build(); |
| 253 | +
|
| 254 | +app.MapControllers(); |
| 255 | +
|
| 256 | +app.Run(); |
| 257 | +``` |
| 258 | + |
| 259 | +### Option 2 |
| 260 | + |
| 261 | +Option 2 requires updating the deployment manifest for the application |
| 262 | +to include an annotation. In our case, we would edit the `deployment.yaml` file to add the |
| 263 | +`splunk.com/exclude` annotation as follows: |
| 264 | + |
| 265 | +``` yaml |
| 266 | +apiVersion: apps/v1 |
| 267 | +kind: Deployment |
| 268 | +metadata: |
| 269 | + name: helloworld |
| 270 | +spec: |
| 271 | + selector: |
| 272 | + matchLabels: |
| 273 | + app: helloworld |
| 274 | + replicas: 1 |
| 275 | + template: |
| 276 | + metadata: |
| 277 | + labels: |
| 278 | + app: helloworld |
| 279 | + annotations: |
| 280 | + splunk.com/exclude: "true" |
| 281 | + spec: |
| 282 | + containers: |
| 283 | + ... |
| 284 | +``` |
| 285 | + |
| 286 | +Please refer to [Managing Log Ingestion by Using Annotations](https://docs.splunk.com/observability/en/gdi/opentelemetry/collector-kubernetes/kubernetes-config-logs.html#manage-log-ingestion-using-annotations) |
| 287 | +for further details on this option. |
0 commit comments