Skip to content

Conversation

@wingy3181
Copy link

@wingy3181 wingy3181 commented Sep 21, 2025

Symptom

Splunk o11y/SignalFX service map fragmentation — producers and consumers appear as disconnected services.

image

Impact

Traces between services mean user requests cannot be followed end-to-end.
This creates blind spots in our telemetry, increases mean time to detect and resolve incidents (MTTR), and undermines root-cause analysis because errors and latency appear in isolation rather than as a connected path.
The result is slower incident response. In practical terms, on-call engineers spend more time stitching logs and guesses together, customer impact lasts longer, and we risk shipping regressions because we can’t prove where time and errors are spent across the request path.

Other solutions

Was given the following from Splunk support to manually instrument and correlate the traces.
This worked but this is then a manual instrumentation (testing with the changes in this PR seem to have the same effect)

+    const messageBody = record.body;
+    const messageAttributes = record.messageAttributes;
+    console.log('--- SQS Message Received ---');
+    console.log('Message Body:', messageBody);
+
+    if (Object.keys(messageAttributes).length > 0) {
+      console.log('Message Attributes (Headers):');
+      for (const key in messageAttributes) {
+        const attribute = messageAttributes[key];
+        console.log(`  ${key}: ${attribute.stringValue} (DataType: ${attribute.dataType})`);
+
+        if (key === 'traceparent') {
+          const traceparent = attribute.stringValue ?? '';
+          const [version, traceId, spanId, traceFlags] = traceparent.split('-');
+          const ctx = propagation.extract(context.active(), { traceparent });
+          const spanContext = trace.getSpan(ctx)?.spanContext();
+          //trace.setSpanContext(ctx);
+          // const tracer = trace.getTracer(process.env.OTEL_SERVICE_NAME);
+          const tracer = trace.getTracer('Cdk101MessagingStack-ConsumerFunction40CB859D-kj2v75P9EOd0');
+          const span = tracer.startSpan("bv-consumer-function", undefined, ctx);
+          console.log('  Extracted Trace Context:', { traceId, spanId, traceFlags });
+          console.log('  Extracted Span Context:', spanContext);
+          span.end();
+        }
+      }
+    } else {
+      console.log('No Message Attributes (Headers) found.');
+    }
image

Have also attemped to use OTEL_PROPAGATORS and added xray but this did not work

@wingy3181 wingy3181 requested review from a team as code owners September 21, 2025 23:30
@github-actions
Copy link

github-actions bot commented Sep 21, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@wingy3181
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

@wingy3181
Copy link
Author

recheck

srv-gh-o11y-gdi-cla added a commit to splunk/cla-agreement that referenced this pull request Sep 21, 2025
@johnbley
Copy link
Collaborator

@seemk Your thoughts on how to proceed with this? I'll work on some general cleanup of the PRs for this component in the meanwhile...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants