Skip to content

Conversation

@abhilash-sivan
Copy link
Contributor

@abhilash-sivan abhilash-sivan commented Nov 21, 2025

refs https://jsw.ibm.com/browse/INSTA-63749

Why

By doing this refactor now, we avoid duplicating stack-handling logic across different instrumentations (DB, messaging, protocol, etc.). This makes future changes easy, more consistent, and more maintainable.

What

This PR is the first step toward implementing the “Error only stack trace filtering" feature. It lays the foundation by centralising the logic around how we set span stack traces, specifically:

  • Bringing code for span.stack and span.data.technology.error into a common utility on error cases.

Making it easier to apply future filtering or removal of stacks based on configuration or environment variables.

Tasks detailed:

  • Extracts a common function to handle span stack-trace setting and custom replacements.

  • Adds logic to overwrite the span’s stack with span.technology.error.stack when relevant.

  • Leaves common fn open for later conditional filtering or removal based on config / environment variables.

  • Refactors existing code to call this new function where needed (partial — instrumentation update is not yet complete).

Future Plan (Next PRs)

I plan to split the work into 2–3 follow-up PRs:

  1. Instrumentation Updates (current PR)

Update all existing instrumentations (DB, messaging, protocol, etc.) to use this common stack-setting function.

Add at least one test per category (DB / messaging / protocol) to ensure consistent behavior.

  1. Feature Implementation

Introduce an environment variable / configuration option (INSTANA_STACK_TRACE) to select between: none, error, all.

Implement conditional logic commonly based on this config:

  • None → do not collect or send stack traces - remove span.stack

  • Error only → if there is an error, overwrite stack; if not, drop span.stack

  • All → always collect; if there’s an error, overwrite; otherwise, leave as-is

Add/update tests to cover these modes.

Task list:

  • Make the setError common across all instrumentation.
  • v8 string to stack trace conversion + 1 test
  • refactor getErrodDetails logic to always return string for span.data.tech.error

Optional

  • Check FUP instr and do not collect stack trace – for performance optimisation.

After implementation:

  • Public doc update

@abhilash-sivan abhilash-sivan changed the title feat: added error stack replacement logic refactor: added error stack replacement logic Nov 21, 2025
@abhilash-sivan abhilash-sivan force-pushed the feat-stack-strace-filter branch from 625de88 to ecb81b7 Compare November 21, 2025 12:53
// Note: Instead of 'pg', we could've passed exports.spanName if they were the same,
// We can’t use spanName here because for this instr the span name is
// "postgres", but the data is stored under span.data.pg.
tracingUtil.setErrorStack(span, error, 'pg');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exports.spanName


if (technology && span.data[technology]) {
// for some cases like http, it is already set with custom values and no need to overwrite the message
span.data[technology].error = span.data[technology].error || exports.getErrorDetails(error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO a refactoring PR is needed before this PR to make that behavior equal everywhere:

from

data = { error:

to

span.data.x.error = ...

Then you can remove this extra check here. Its too confusing.

if (error && error.stack) {
// no need to consider length for error cases, we can send the whole stack trace as per design
// TODO: It will be recorded as string, revisit to change structure
span.stack = error.stack;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to consider length for error cases

Why? 🤔 I think we have to call a new implementation of getErrorDetails

return String(err.stack).substring(0, 500);

And remove the previous impl.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also

return String(err.stack).substring(0, 500);

500 will be replaced with the config in the last PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration we are adding controls the span.stack length, which should support values like 10 (default) or up to 25 (future implementation plan in final PR).
This setting is meant to limit the number of stack frames, so it applies to an array, not a string.

The current logic for span.data.tech.error uses:

String(err.stack).substring(0, 500)

This truncates a string, which cannot be aligned with a stack-frame limit because the data types don’t match. To enforce a limit on the number of frames (10 or 25), we would need getErrorDetails or a replacement function to return stack traces as an array, which requires custom logic to parse V8 stack output into the expected format. Applying a "25-frame limit" to a string is meaningless and would cut off valid information.

The reason the current implementation returns a string is because it must fall back to err.message when no err.stack exists—which is valid—and err.message is naturally a string. Thus, the return type must stay as a string.

We could apply the limit if error.stack were already an array, but as noted, that wouldn't work consistently because of the fallback case.

So we leave this core logic as untouched.

A possible approach for span.stack is:

span.stack = arrayFormatted(error.stack)

Where arrayFormatted() [need to come up with a good name] converts the V8 stack string into an array so we can safely apply the limit. Meanwhile, span.data.tech.error can continue storing the full, stringified error details, truncated to 500 characters (which seems reasonable).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also considering

Tracers MAY consider the whole generated stack, disregarding the stack-trace-length configuration, when reporting the span.stack field for erroneous EXIT spans.

limitting is optional in this scenario

Copy link
Contributor

@kirrg001 kirrg001 Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed:

In this PR we keep substring 500 on error stack.

An upcoming PR will change error stack to frames

span.stack = arrayFormatted(error.stack)

We should not release before we have the frame limit config applied on error stack as well because customers do already use stack trace length config (applies to span.stack!).

Recommendation: create a parent branch (feat-stack-trace) and merge the single changes into it.

kind: constants.EXIT
});
span.b = { s: 1 };
span.stack = tracingUtil.getStackTrace(instrumentedAccessFunction);
Copy link
Contributor

@kirrg001 kirrg001 Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generate/Create only those stack traces which align strictly with the customer's settings

I am missing an action item in your plan for this rule.

IMO we have to move

span.stack = tracingUtil.getStackTrace(instrumentedAccessFunction);

to the success case.

In the fn getStackTrace will then either generate the stack trace or not - based on the customer config (last PR).

}

return cls.ns.runAndReturn(() => {
const span = cls.startSpan({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not create stack traces at all for spans which are already filtered out

I am missing an action item for this rule as well.
How do we achieve that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants