-
Notifications
You must be signed in to change notification settings - Fork 16
feat: Extract sampling priority from tracer and apply to new lambda span #539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1e7d896 to
c0e8de6
Compare
duncanista
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM – make sure to manually test in a Lambda with an inferred span, double hop inferred span, and cold start. Check in APM that those traces also get dropped.
|
Tested and works with:
Sampling priority is not propagated through single/double hop trace propagation cases; we'd need to implement that in a separate PR. The Go agent logic that handles that is in this file: https://github.com/DataDog/datadog-agent/blob/4b5c8b9270fe4626702db6d66298a060176251d0/pkg/serverless/trace/propagation/extractor.go They're also dropped in APM |
What does this PR do?
How sampling works normally (for host-based apps, not Lambda):
The tracer decides whether or not to sample a trace based on the
DD_TRACE_SAMPLING_RULES(new) orDD_TRACE_SAMPLING_RATE(deprecated) env vars. Then, the tracer sets the_sampling_priority_v1metric to the sampling priority on some or all spans. If the priority is <= 0, it should be dropped; otherwise it should be sampled.Then the agent gets the sampling priority from this metric and decides whether to drop or sample the trace based on this sampling priority and other factors, e.g. if an error occurred or other special rules.
How sampling works in Lambda
Since the serverless agent drops the lambda span received from the tracer and creates a new lambda span, we instead get the sampling priority from the request headers of the
/lambda/end-invocationrequest.libdatadogset the chunk priority innormalizehttps://github.com/DataDog/libdatadog/blob/c6ad4ff82852ed6d39fd618a602020b6ba623ed6/trace-normalization/src/normalizer.rs#L87-L99
The complex sampling logic exists in the main
datadog-agentand in the backend, but not in the serverless Go agent. Therefore, in Lambda (Python, Node, and serverless Go agent), we historically just send all traces to the backend, and let it decide whether to sample/drop traces.This allows env variables like
DD_TRACE_SAMPLING_RULESandDD_TRACE_SAMPLING_RATEto work with Bottlecap for universally instrumented runtimes.Describe how you validated your changes
Manual testing. Prior to this PR, sampling rules worked in Node+Python, but not in Java/.NET/Golang with Bottlecap.
After these changes, sampling rules work in: Java+Golang now, and Node+Python still work as expected. As for .NET, this is one fix, but it turns out the .NET tracer does not send the correct sampling priority header on the
/lambda/end-invocationrequest. That needs to be fixed.I also added unit tests:
Additional Notes
In the future, we could implement the complex sampling logic that exists in the main agent to take some load off the backend (this would be done in libdatadog). This consists of logic like drop traces with negative priority but don't drop traces with errors, etc.
There are many types of sampling rules, here's the list in order by priority. This PR covers the first cases 1-4 and 6 but not 5: