[AGNTLOG-468] chore(datadog): add trace sampling #1110

andrewqian2001datadog · 2026-01-20T15:21:26Z

Summary

This PR adds sampling (probalistic, error) for OTLP traces that match the agent's behavior.

Change Type

Bug fix
New feature
Non-functional (chore, refactoring, docs)
Performance

How did you test this PR?

Correctness tests, manual testing.

build adp and run with sampling enabled.

DD_DATA_PLANE_OTLP_ENABLED=true \
        DD_DATA_PLANE_OTLP_PROXY_ENABLED=false \
        DD_OTLP_CONFIG='{}' \
        DD_APM_CONFIG__PROBABILISTIC_SAMPLER__ENABLED=true \
        DD_APM_CONFIG__PROBABILISTIC_SAMPLER__SAMPLING_PERCENTAGE=50 \
        RUST_LOG=info \
        make run-adp-standalone

Send traces, verify that around ~DD_APM_CONFIG__PROBABILISTIC_SAMPLER__SAMPLING_PERCENTAGE are missing from the trace explorer.

References

pr-commenter · 2026-01-20T15:27:06Z

Binary Size Analysis (Agent Data Plane)

Target: 85eb93a (baseline) vs bce63b0 (comparison) diff
Baseline Size: 361.64 MiB
Comparison Size: 362.62 MiB
Size Change: +997.68 KiB (+0.27%)
Pass/Fail Threshold: +5%
Result: PASSED ✅

Changes by Module

Module	File Size	Symbols
[debug sections]	+866.15 KiB	7
tokio	+38.34 KiB	710
saluki_components::transforms::trace_sampler	+37.12 KiB	22
anyhow	+30.81 KiB	397
saluki_components::common::datadog	+22.80 KiB	77
&mut serde_json	-21.08 KiB	28
serde_json	+19.77 KiB	56
hyper	-19.43 KiB	187
http_body_util	-18.28 KiB	63
rustls	-17.55 KiB	19
hashbrown	+16.70 KiB	55
tokio_rustls	+15.96 KiB	32
std	-15.06 KiB	140
h2	-14.94 KiB	229
saluki_components::destinations::prometheus	+14.64 KiB	6
tonic	+13.85 KiB	136
quick_cache	+13.13 KiB	19
saluki_common::cache::Cache<K,V,W,H>	-12.96 KiB	10
serde	+11.67 KiB	23
futures_util	+11.51 KiB	23

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.4%  +444Ki  [ = ]       0    [section .debug_loc]
  +0.4%  +227Ki  [ = ]       0    [section .debug_str]
  +0.1%  +173Ki  [ = ]       0    [section .debug_info]
  +1.1%  +136Ki  +0.9% +74.5Ki    [10770 Others]
  +0.1% +26.7Ki  [ = ]       0    [section .debug_line]
  [NEW] +16.5Ki  [NEW] +16.3Ki    _<saluki_components::transforms::trace_sampler::TraceSampler as saluki_core::components::transforms::Transform>::run::_{{closure}}::h236850543abc2256
  +991% +15.5Ki +11e2% +15.5Ki    saluki_components::common::datadog::io::TransactionForwarder<B>::from_config::hbef5e89b3b2d6bee
  [NEW] +11.7Ki  [NEW] +11.6Ki    h2::proto::streams::streams::Streams<B,P>::poll_complete::h69ea31998f5c3306
  +551% +11.2Ki  +604% +11.2Ki    _<saluki_components::sources::otlp::logs::translator::OtlpLogsTranslator as core::iter::traits::iterator::Iterator>::next::h86123bab2ef768a6
  [NEW] +11.0Ki  [NEW] +10.9Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::h5ee73936d390c4c7
  [NEW] +10.9Ki  [NEW] +10.7Ki    _<figment::value::magic::RelativePathBuf as figment::value::magic::Magic>::deserialize_from::h3840418e10f9e8c3
  +411% +10.7Ki  +452% +10.7Ki    _<saluki_components::destinations::prometheus::PrometheusConfiguration as saluki_core::components::destinations::builder::DestinationBuilder>::build::_{{closure}}::h6ff07873a332029b
  +448% +10.3Ki  +474% +10.3Ki    _<core::pin::Pin<P> as core::future::future::Future>::poll::h82bd0783ea53d677
 -56.8% -10.5Ki -57.1% -10.5Ki    figment::figment::Figment::extract::h6b2f4f24781c070b
  [DEL] -11.2Ki  [DEL] -11.1Ki    h2::server::Connection<T,B>::poll_closed::hb168443a782f39ca
  [DEL] -12.0Ki  [DEL] -11.9Ki    saluki_components::sources::otlp::logs::transform::transform_log_record::hbaf9b216dde5032a
  [DEL] -12.1Ki  [DEL] -12.0Ki    h2::server::Connection<T,B>::poll_closed::hf493801a8b14506d
  -0.1% -12.9Ki  [ = ]       0    [section .debug_ranges]
 -69.8% -14.1Ki -70.2% -14.1Ki    h2::proto::connection::DynConnection<B>::recv_frame::hbe7dfa76bf469e9e
 -73.8% -17.4Ki -74.1% -17.4Ki    h2::proto::connection::DynConnection<B>::recv_frame::hff5cbc80b37e5383
  [DEL] -18.0Ki  [DEL] -17.9Ki    h2::proto::connection::DynConnection<B>::recv_frame::hf52c1fc561321a83
  +0.3%  +997Ki  +0.3% +76.8Ki    TOTAL

pr-commenter · 2026-01-20T15:56:28Z

Regression Detector (Agent Data Plane)

Regression Detector Results

Run ID: 0aea9563-bc09-4a86-b38d-af2aef8127ab

Baseline: 85eb93a
Comparison: bce63b0
Diff

❌ Experiments with retried target crashes

This is a critical error. One or more replicates failed with a non-zero exit code. These replicates may have been retried. See Replicate Execution Details for more information.

quality_gates_rss_dsd_ultraheavy

Optimization Goals: ✅ No significant changes detected

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	otlp_ingest_logs_adp	memory utilization	+2.13	[+1.84, +2.42]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_ultraheavy	ingress throughput	+0.01	[-0.07, +0.09]	1	(metrics) (profiles) (logs)
➖	dsd_uds_512kb_3k_contexts_throughput	ingress throughput	+0.00	[-0.05, +0.05]	1	(metrics) (profiles) (logs)
➖	dsd_uds_1mb_3k_contexts_throughput	ingress throughput	+0.00	[-0.05, +0.06]	1	(metrics) (profiles) (logs)
➖	dsd_uds_100mb_3k_contexts_throughput	ingress throughput	+0.00	[-0.02, +0.02]	1	(metrics) (profiles) (logs)
➖	dsd_uds_10mb_3k_contexts_throughput	ingress throughput	-0.00	[-0.19, +0.19]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_heavy	memory utilization	-0.02	[-0.16, +0.12]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_low	memory utilization	-0.05	[-0.18, +0.08]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_medium	memory utilization	-0.08	[-0.24, +0.09]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_idle	memory utilization	-0.26	[-0.28, -0.23]	1	(metrics) (profiles) (logs)
➖	dsd_uds_500mb_3k_contexts_throughput	ingress throughput	-2.38	[-2.51, -2.24]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_metrics_adp	memory utilization	-2.51	[-2.67, -2.34]	1

Bounds Checks: ✅ Passed

perf	experiment	bounds_check_name	replicates_passed	links
✅	quality_gates_rss_dsd_heavy	memory_usage	10/10	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_low	memory_usage	10/10	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_medium	memory_usage	10/10	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_ultraheavy	memory_usage	10/10	(metrics) (profiles) (logs)
✅	quality_gates_rss_idle	memory_usage	10/10	(metrics) (profiles) (logs)

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

Replicate Execution Details

We run multiple replicates for each experiment/variant. However, we allow replicates to be automatically retried if there are any failures, up to 8 times, at which point the replicate is marked dead and we are unable to run analysis for the entire experiment. We call each of these attempts at running replicates a replicate execution. This section lists all replicate executions that failed due to the target crashing or being oom killed.

Note: In the below tables we bucket failures by experiment, variant, and failure type. For each of these buckets we list out the replicate indexes that failed with an annotation signifying how many times said replicate failed with the given failure mode. In the below example the baseline variant of the experiment named experiment_with_failures had two replicates that failed by oom kills. Replicate 0, which failed 8 executions, and replicate 1 which failed 6 executions, all with the same failure mode.

Experiment	Variant	Replicates	Failure	Logs	Debug Dashboard
experiment_with_failures	baseline	0 (x8) 1 (x6)	Oom killed		Debug Dashboard

The debug dashboard links will take you to a debugging dashboard specifically designed to investigate replicate execution failures.

❌ Retried Normal Replicate Execution Failures (non-profiling)

Experiment	Variant	Replicates	Failure	Debug Dashboard
quality_gates_rss_dsd_ultraheavy	comparison	0	Failed to shutdown when requested	Debug Dashboard

lib/saluki-components/src/encoders/datadog/traces/mod.rs

lib/saluki-components/src/transforms/trace_sampler/mod.rs

tobz · 2026-01-21T14:02:47Z

lib/saluki-components/src/transforms/trace_sampler/mod.rs

+}
+
+#[async_trait]
+impl Transform for TraceSampler {


Mostly saying this out loud so that I don't forget, but we'll likely want to end up making this a synchronous transform so that we can avoid the cost of message passing.

lib/saluki-components/src/transforms/trace_sampler/mod.rs

lib/saluki-core/src/data_model/event/trace/mod.rs

… metadata from the sampling transform component

rayz · 2026-01-23T15:06:10Z

lib/saluki-components/src/transforms/trace_sampler/mod.rs

+
+/// Configuration for the trace sampler transform.
+#[derive(Debug, Deserialize)]
+pub struct TraceSamplerConfiguration {


These configurations won't parse properly. They need to be in https://github.com/DataDog/saluki/blob/main/lib/saluki-components/src/common/datadog/apm.rs#L35

lib/saluki-components/src/common/otlp/traces/translator.rs

lib/saluki-components/src/transforms/trace_sampler/errors.rs

lib/saluki-components/src/transforms/trace_sampler/core_sampler.rs

lib/saluki-components/src/transforms/trace_sampler/probabilistic.rs

lib/saluki-components/src/transforms/trace_sampler/mod.rs

…pling

andrewqian2001datadog changed the title ~~Andrewq/add trace sampling~~ (WIP) Andrewq/add trace sampling Jan 20, 2026

github-actions bot added area/core Core functionality, event model, etc. area/components Sources, transforms, and destinations. labels Jan 20, 2026

tobz requested changes Jan 21, 2026

View reviewed changes

andrewqian2001datadog changed the title ~~(WIP) Andrewq/add trace sampling~~ [AGNTLOG-468] chore(datadog): add trace sampling Jan 21, 2026

andrewqian2001datadog added 11 commits January 21, 2026 10:36

modify the Trace event to include a TraceSampling field, which stores…

47f9942

… metadata from the sampling transform component

add the probabilistic sampler

704d1b1

add the error sampler

b3979ab

add the core sampler

55cdb76

add the score sampler

e921c3a

add setter for trace spans

b6dac83

add trace_sampler to mod.rs

de45c6d

checks for user set sampling in the translator

30b0625

add the trace sampler transform component

8017f4c

add helper class to compute signature

a32499f

modified the trace encoder to use sampling info from the transform

df45987

andrewqian2001datadog force-pushed the andrewq/add-trace-sampling branch from 942206a to df45987 Compare January 21, 2026 15:53

andrewqian2001datadog added 2 commits January 21, 2026 11:12

make check clippy + fmt

f5b45ed

wire up component

76ab423

andrewqian2001datadog marked this pull request as ready for review January 22, 2026 21:44

andrewqian2001datadog requested a review from a team as a code owner January 22, 2026 21:44

andrewqian2001datadog added 4 commits January 22, 2026 17:43

remove default macro which sets probablistic sampling to false

321b7f7

from tag from span which is already added to trace chunk

df98d8f

make fmt

740b8f7

remove tag added during translation process

ecdb386

rayz requested changes Jan 23, 2026

View reviewed changes

andrewqian2001datadog added 2 commits January 23, 2026 10:08

disable tag when probalistic sampler isn't being used

91254c7

revert change

43a889d

andrewqian2001datadog added 2 commits January 23, 2026 10:40

use apm config

cd1d216

make check-clippy

67c2985

tobz requested changes Jan 23, 2026

View reviewed changes

andrewqian2001datadog added 9 commits January 23, 2026 14:00

address comments

d17c04f

fix missing doc, fix bugs

633b43a

address comments part two

9c20678

remove dead code, move code only used in test

34f0114

remove more dead code

934411a

formatting

fdc04fd

Merge remote-tracking branch 'origin/main' into andrewq/add-trace-sam…

731883c

…pling

move missing functionality to docs of trace_sampler

2a77b75

set probilistic sampler to false by default

bce63b0

[AGNTLOG-468] chore(datadog): add trace sampling #1110

Are you sure you want to change the base?

[AGNTLOG-468] chore(datadog): add trace sampling #1110

Uh oh!

Conversation

andrewqian2001datadog commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type

How did you test this PR?

References

Uh oh!

pr-commenter bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Binary Size Analysis (Agent Data Plane)

Changes by Module

Detailed Symbol Changes

Uh oh!

pr-commenter bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector (Agent Data Plane)

Regression Detector Results

❌ Experiments with retried target crashes

Optimization Goals: ✅ No significant changes detected

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

Replicate Execution Details

❌ Retried Normal Replicate Execution Failures (non-profiling)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tobz Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rayz Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andrewqian2001datadog commented Jan 20, 2026 •

edited

Loading

pr-commenter bot commented Jan 20, 2026 •

edited

Loading

pr-commenter bot commented Jan 20, 2026 •

edited

Loading