feat: Extract sampling priority from tracer and apply to new lambda span #539

nhulston · 2025-02-04T18:25:49Z

What does this PR do?

How sampling works normally (for host-based apps, not Lambda):
The tracer decides whether or not to sample a trace based on the DD_TRACE_SAMPLING_RULES (new) or DD_TRACE_SAMPLING_RATE (deprecated) env vars. Then, the tracer sets the _sampling_priority_v1 metric to the sampling priority on some or all spans. If the priority is <= 0, it should be dropped; otherwise it should be sampled.

Then the agent gets the sampling priority from this metric and decides whether to drop or sample the trace based on this sampling priority and other factors, e.g. if an error occurred or other special rules.

How sampling works in Lambda
Since the serverless agent drops the lambda span received from the tracer and creates a new lambda span, we instead get the sampling priority from the request headers of the /lambda/end-invocation request.

Currently the Go serverless agent sets the chunk priority directly
This PR sets the sampling priority on the newly created aws.lambda span, and then lets libdatadog set the chunk priority in normalize
https://github.com/DataDog/libdatadog/blob/c6ad4ff82852ed6d39fd618a602020b6ba623ed6/trace-normalization/src/normalizer.rs#L87-L99

The complex sampling logic exists in the main datadog-agent and in the backend, but not in the serverless Go agent. Therefore, in Lambda (Python, Node, and serverless Go agent), we historically just send all traces to the backend, and let it decide whether to sample/drop traces.

This allows env variables like DD_TRACE_SAMPLING_RULES and DD_TRACE_SAMPLING_RATE to work with Bottlecap for universally instrumented runtimes.

Describe how you validated your changes

Manual testing. Prior to this PR, sampling rules worked in Node+Python, but not in Java/.NET/Golang with Bottlecap.

After these changes, sampling rules work in: Java+Golang now, and Node+Python still work as expected. As for .NET, this is one fix, but it turns out the .NET tracer does not send the correct sampling priority header on the /lambda/end-invocation request. That needs to be fixed.

I also added unit tests:

cargo test test_update_span_context_with_sampling_priority
cargo test test_update_span_context_with_invalid_priority
cargo test test_update_span_context_no_sampling_priority

Additional Notes

In the future, we could implement the complex sampling logic that exists in the main agent to take some load off the backend (this would be done in libdatadog). This consists of logic like drop traces with negative priority but don't drop traces with errors, etc.

There are many types of sampling rules, here's the list in order by priority. This PR covers the first cases 1-4 and 6 but not 5:

remote sampling rules
local sampling rules
remote global sampling rate
local global sampling rate
sampling rates from the agent (max traces per second)
if nothing else, rate defaults to 100% (keep all traces)

bottlecap/src/lifecycle/invocation/processor.rs

bottlecap/src/traces/trace_processor.rs

…let libdatadog set chunk priority. https://github.com/DataDog/libdatadog/blob/c6ad4ff82852ed6d39fd618a602020b6ba623ed6/trace-normalization/src/normalizer.rs#L87-L99

bottlecap/src/lifecycle/invocation/processor.rs

duncanista

LGTM – make sure to manually test in a Lambda with an inferred span, double hop inferred span, and cold start. Check in APM that those traces also get dropped.

nhulston · 2025-02-10T16:41:51Z

Tested and works with:

Cold start
Child span
Inferred span with no trace context

Sampling priority is not propagated through single/double hop trace propagation cases; we'd need to implement that in a separate PR. The Go agent logic that handles that is in this file: https://github.com/DataDog/datadog-agent/blob/4b5c8b9270fe4626702db6d66298a060176251d0/pkg/serverless/trace/propagation/extractor.go

They're also dropped in APM

Extract sampling priority from tracer

967d417

nhulston changed the title ~~[feat] Extract sampling priority from tracer and drop when sampling priority <= 0~~ feat: Extract sampling priority from tracer and drop when sampling priority <= 0 Feb 4, 2025

nhulston added 3 commits February 4, 2025 13:29

Drop chunk during processing if priority < 0

60b6d11

Unit tests

18c714b

fix lint

c0e8de6

nhulston force-pushed the nicholas.hulston/implement-sampling-priority branch from 1e7d896 to c0e8de6 Compare February 4, 2025 18:29

nhulston marked this pull request as ready for review February 4, 2025 18:49

nhulston requested a review from a team as a code owner February 4, 2025 18:49

duncanista reviewed Feb 5, 2025

View reviewed changes

bottlecap/src/lifecycle/invocation/processor.rs Show resolved Hide resolved

duncanista reviewed Feb 5, 2025

View reviewed changes

bottlecap/src/traces/trace_processor.rs Outdated Show resolved Hide resolved

nhulston marked this pull request as draft February 5, 2025 18:22

nhulston added 5 commits February 6, 2025 11:45

Simplify chunk priority logic

c5e6a8c

set metric on new lambda span's metrics[_sampling_priority_v1] and …

19a9453

…let libdatadog set chunk priority. https://github.com/DataDog/libdatadog/blob/c6ad4ff82852ed6d39fd618a602020b6ba623ed6/trace-normalization/src/normalizer.rs#L87-L99

update processor unit tests

826f826

undo allow clippy too many arguments

5022288

use const for priority metric

9b5f040

nhulston changed the title ~~feat: Extract sampling priority from tracer and drop when sampling priority <= 0~~ feat: Extract sampling priority from tracer and apply to new lambda span Feb 7, 2025

nhulston marked this pull request as ready for review February 7, 2025 20:13

duncanista reviewed Feb 7, 2025

View reviewed changes

bottlecap/src/lifecycle/invocation/processor.rs Outdated Show resolved Hide resolved

remove log

b388d8e

nhulston requested a review from duncanista February 7, 2025 20:14

duncanista approved these changes Feb 7, 2025

View reviewed changes

Merge branch 'main' into nicholas.hulston/implement-sampling-priority

e98d9a5

nhulston closed this Feb 10, 2025

nhulston reopened this Feb 10, 2025

Merge branch 'main' into nicholas.hulston/implement-sampling-priority

518912e

nhulston merged commit e02939e into main Feb 10, 2025
23 checks passed

nhulston deleted the nicholas.hulston/implement-sampling-priority branch February 10, 2025 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Extract sampling priority from tracer and apply to new lambda span #539

feat: Extract sampling priority from tracer and apply to new lambda span #539

Uh oh!

nhulston commented Feb 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

duncanista left a comment

Uh oh!

nhulston commented Feb 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Extract sampling priority from tracer and apply to new lambda span #539

feat: Extract sampling priority from tracer and apply to new lambda span #539

Uh oh!

Conversation

nhulston commented Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Describe how you validated your changes

Additional Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

duncanista left a comment

Choose a reason for hiding this comment

Uh oh!

nhulston commented Feb 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nhulston commented Feb 4, 2025 •

edited

Loading