-
Notifications
You must be signed in to change notification settings - Fork 39
refactor: added error stack replacement logic #2167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
625de88 to
ecb81b7
Compare
| // Note: Instead of 'pg', we could've passed exports.spanName if they were the same, | ||
| // We can’t use spanName here because for this instr the span name is | ||
| // "postgres", but the data is stored under span.data.pg. | ||
| tracingUtil.setErrorStack(span, error, 'pg'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exports.spanName
|
|
||
| if (technology && span.data[technology]) { | ||
| // for some cases like http, it is already set with custom values and no need to overwrite the message | ||
| span.data[technology].error = span.data[technology].error || exports.getErrorDetails(error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO a refactoring PR is needed before this PR to make that behavior equal everywhere:
from
data = { error:
to
span.data.x.error = ...
Then you can remove this extra check here. Its too confusing.
| if (error && error.stack) { | ||
| // no need to consider length for error cases, we can send the whole stack trace as per design | ||
| // TODO: It will be recorded as string, revisit to change structure | ||
| span.stack = error.stack; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to consider length for error cases
Why? 🤔 I think we have to call a new implementation of getErrorDetails
return String(err.stack).substring(0, 500);
And remove the previous impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also
return String(err.stack).substring(0, 500);
500 will be replaced with the config in the last PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The configuration we are adding controls the span.stack length, which should support values like 10 (default) or up to 25 (future implementation plan in final PR).
This setting is meant to limit the number of stack frames, so it applies to an array, not a string.
The current logic for span.data.tech.error uses:
String(err.stack).substring(0, 500)
This truncates a string, which cannot be aligned with a stack-frame limit because the data types don’t match. To enforce a limit on the number of frames (10 or 25), we would need getErrorDetails or a replacement function to return stack traces as an array, which requires custom logic to parse V8 stack output into the expected format. Applying a "25-frame limit" to a string is meaningless and would cut off valid information.
The reason the current implementation returns a string is because it must fall back to err.message when no err.stack exists—which is valid—and err.message is naturally a string. Thus, the return type must stay as a string.
We could apply the limit if error.stack were already an array, but as noted, that wouldn't work consistently because of the fallback case.
So we leave this core logic as untouched.
A possible approach for span.stack is:
span.stack = arrayFormatted(error.stack)
Where arrayFormatted() [need to come up with a good name] converts the V8 stack string into an array so we can safely apply the limit. Meanwhile, span.data.tech.error can continue storing the full, stringified error details, truncated to 500 characters (which seems reasonable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also considering
Tracers MAY consider the whole generated stack, disregarding the
stack-trace-lengthconfiguration, when reporting thespan.stackfield for erroneous EXIT spans.
limitting is optional in this scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed:
In this PR we keep substring 500 on error stack.
An upcoming PR will change error stack to frames
span.stack = arrayFormatted(error.stack)
We should not release before we have the frame limit config applied on error stack as well because customers do already use stack trace length config (applies to span.stack!).
Recommendation: create a parent branch (feat-stack-trace) and merge the single changes into it.
| kind: constants.EXIT | ||
| }); | ||
| span.b = { s: 1 }; | ||
| span.stack = tracingUtil.getStackTrace(instrumentedAccessFunction); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generate/Create only those stack traces which align strictly with the customer's settings
I am missing an action item in your plan for this rule.
IMO we have to move
span.stack = tracingUtil.getStackTrace(instrumentedAccessFunction);
to the success case.
In the fn getStackTrace will then either generate the stack trace or not - based on the customer config (last PR).
| } | ||
|
|
||
| return cls.ns.runAndReturn(() => { | ||
| const span = cls.startSpan({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not create stack traces at all for spans which are already filtered out
I am missing an action item for this rule as well.
How do we achieve that?
refs https://jsw.ibm.com/browse/INSTA-63749
Why
By doing this refactor now, we avoid duplicating stack-handling logic across different instrumentations (DB, messaging, protocol, etc.). This makes future changes easy, more consistent, and more maintainable.
What
This PR is the first step toward implementing the “Error only stack trace filtering" feature. It lays the foundation by centralising the logic around how we set span stack traces, specifically:
span.stackandspan.data.technology.errorinto a common utility on error cases.Making it easier to apply future filtering or removal of stacks based on configuration or environment variables.
Tasks detailed:
Extracts a common function to handle span stack-trace setting and custom replacements.
Adds logic to overwrite the span’s stack with span.technology.error.stack when relevant.
Leaves common fn open for later conditional filtering or removal based on config / environment variables.
Refactors existing code to call this new function where needed (partial — instrumentation update is not yet complete).
Future Plan (Next PRs)
I plan to split the work into 2–3 follow-up PRs:
Update all existing instrumentations (DB, messaging, protocol, etc.) to use this common stack-setting function.
Add at least one test per category (DB / messaging / protocol) to ensure consistent behavior.
Introduce an environment variable / configuration option (INSTANA_STACK_TRACE) to select between: none, error, all.
Implement conditional logic commonly based on this config:
None → do not collect or send stack traces - remove span.stack
Error only → if there is an error, overwrite stack; if not, drop span.stack
All → always collect; if there’s an error, overwrite; otherwise, leave as-is
Add/update tests to cover these modes.
Task list:
Optional
After implementation: