Skip to content

Commit 5f2e83a

Browse files
authored
[SVLS-6907] feat: Support span dedup (#939)
## TL;DR - Support deduping spans by span_id - which fixes trace stats overcount issue for Node.js ## Problem Node.js tracer sometimes sends duplicate traces to the extension, causing over-count of trace stats. In our tests, the over-count is usually 1–4 for 50, 500 or 5000 invocations. This only happens when the "default" flushing strategy (which uses the "continuous" strategy) is used, and doesn't happen when the "end" strategy is used. ## Cause of problem I think it's similar to a known problem of continuous flushing. - The known problem is, when Lambda runtime is frozen, the connection between **extension** and **DD endpoint** can time out, causing data flush failure. - In the trace stats case, the problem is, when Lambda runtime is frozen, the connection between **tracer** and **extension** can time out. Extension receives the request from tracer, then freezes before sending a response, causing the tracer's request to time out, which makes the tracer resend the trace. This doesn't happen with the END flush strategy because in that case, after the Lambda handler finishes, extension still needs to flush the data and doesn't freeze so fast, and it has enough time to respond to tracer. ## Testing ### Steps - Build a test extension layer - Run e2e tests, including: - Install it on various Lambda functions - Invoke these functions with various traffic pattern - Check the trace stats result ### Result **Before:** - Over-count happens for (1) Node.js runtime + (2) "Sampling" test, which uses the default flush strategy - (vs expected 50) <img width="1336" height="277" alt="Screenshot 2025-11-20 at 9 30 00 PM" src="https://github.com/user-attachments/assets/3bf33c26-8996-43e4-8df5-7313247926b3" /> **After:** - No over-count (vs expected 5000) <img width="1447" height="421" alt="Screenshot 2025-11-20 at 9 25 32 PM" src="https://github.com/user-attachments/assets/21576483-dbdc-499f-acc8-1801d3927b0b" /> ## Options considered 1. **At most once**: Disable retry in tracer, at least for Lambda. 2. **At least once** - 2.1 Do nothing. Call out this as a known limitation. - 2.2 Treat tracer as VIP. Before calling /next, make sure tracer's requests have been responded. This may cause regression on invocation duration, especially when volume is high. 3. **Exactly once**: Implement dedup in extension, by trace_id. This PR chooses 3 because it's the easiest. ## Notes Thanks @astuyve @rochdev @purple4reina for discussion. Thanks Cursor for writing most of the code. The under-count issue will be addressed separately. Related issue: #688
1 parent 6094949 commit 5f2e83a

File tree

7 files changed

+408
-16
lines changed

7 files changed

+408
-16
lines changed

bottlecap/Cargo.lock

Lines changed: 8 additions & 8 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

bottlecap/Cargo.toml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -61,12 +61,12 @@ indexmap = {version = "2.11.0", default-features = false}
6161
# be found in the clippy.toml file adjacent to this Cargo.toml.
6262
datadog-protos = { version = "0.1.0", default-features = false, git = "https://github.com/DataDog/saluki/", rev = "c89b58e5784b985819baf11f13f7d35876741222"}
6363
ddsketch-agent = { version = "0.1.0", default-features = false, git = "https://github.com/DataDog/saluki/", rev = "c89b58e5784b985819baf11f13f7d35876741222"}
64-
libdd-common = { git = "https://github.com/DataDog/libdatadog", rev = "c98467eb286c61b4483b5af5a33b268a55ccc6ff" }
65-
libdd-trace-protobuf = { git = "https://github.com/DataDog/libdatadog", rev = "c98467eb286c61b4483b5af5a33b268a55ccc6ff" }
66-
libdd-trace-utils = { git = "https://github.com/DataDog/libdatadog", rev = "c98467eb286c61b4483b5af5a33b268a55ccc6ff" , features = ["mini_agent"] }
67-
libdd-trace-normalization = { git = "https://github.com/DataDog/libdatadog", rev = "c98467eb286c61b4483b5af5a33b268a55ccc6ff" }
68-
datadog-trace-obfuscation = { git = "https://github.com/DataDog/libdatadog", rev = "c98467eb286c61b4483b5af5a33b268a55ccc6ff" }
69-
libdd-trace-stats = { git = "https://github.com/DataDog/libdatadog", rev = "c98467eb286c61b4483b5af5a33b268a55ccc6ff" }
64+
libdd-common = { git = "https://github.com/DataDog/libdatadog", rev = "7540423559548ce049c5424599d28ee1731378e8" }
65+
libdd-trace-protobuf = { git = "https://github.com/DataDog/libdatadog", rev = "7540423559548ce049c5424599d28ee1731378e8" }
66+
libdd-trace-utils = { git = "https://github.com/DataDog/libdatadog", rev = "7540423559548ce049c5424599d28ee1731378e8" , features = ["mini_agent"] }
67+
libdd-trace-normalization = { git = "https://github.com/DataDog/libdatadog", rev = "7540423559548ce049c5424599d28ee1731378e8" }
68+
datadog-trace-obfuscation = { git = "https://github.com/DataDog/libdatadog", rev = "7540423559548ce049c5424599d28ee1731378e8" }
69+
libdd-trace-stats = { git = "https://github.com/DataDog/libdatadog", rev = "7540423559548ce049c5424599d28ee1731378e8" }
7070
dogstatsd = { git = "https://github.com/DataDog/serverless-components", rev = "502f005c56b8d51dee95424a9c1404df46e2aae4", default-features = false }
7171
datadog-fips = { git = "https://github.com/DataDog/serverless-components", rev = "502f005c56b8d51dee95424a9c1404df46e2aae4", default-features = false }
7272
libddwaf = { version = "1.28.1", git = "https://github.com/DataDog/libddwaf-rust", rev = "d1534a158d976bd4f747bf9fcc58e0712d2d17fc", default-features = false, features = ["serde"] }

bottlecap/src/bin/bottlecap/main.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ use bottlecap::{
6262
propagation::DatadogCompositePropagator,
6363
proxy_aggregator,
6464
proxy_flusher::Flusher as ProxyFlusher,
65+
span_dedup_service,
6566
stats_aggregator::StatsAggregator,
6667
stats_concentrator_service::{StatsConcentratorHandle, StatsConcentratorService},
6768
stats_flusher::{self, StatsFlusher},
@@ -1085,6 +1086,9 @@ fn start_trace_agent(
10851086
obfuscation_config: Arc::new(obfuscation_config),
10861087
});
10871088

1089+
let (span_dedup_service, span_dedup_handle) = span_dedup_service::DedupService::new();
1090+
tokio::spawn(span_dedup_service.run());
1091+
10881092
// Proxy
10891093
let proxy_aggregator = Arc::new(TokioMutex::new(proxy_aggregator::Aggregator::default()));
10901094
let proxy_flusher = Arc::new(ProxyFlusher::new(
@@ -1105,6 +1109,7 @@ fn start_trace_agent(
11051109
appsec_processor,
11061110
Arc::clone(tags_provider),
11071111
stats_concentrator_handle.clone(),
1112+
span_dedup_handle,
11081113
);
11091114
let trace_agent_channel = trace_agent.get_sender_copy();
11101115
let shutdown_token = trace_agent.shutdown_token();

bottlecap/src/traces/mod.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ pub mod context;
55
pub mod propagation;
66
pub mod proxy_aggregator;
77
pub mod proxy_flusher;
8+
pub mod span_dedup;
9+
pub mod span_dedup_service;
810
pub mod span_pointers;
911
pub mod stats_aggregator;
1012
pub mod stats_concentrator_service;

bottlecap/src/traces/span_dedup.rs

Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
// Copyright 2023-Present Datadog, Inc. https://www.datadoghq.com/
2+
// SPDX-License-Identifier: Apache-2.0
3+
4+
use std::collections::{HashSet, VecDeque};
5+
6+
const DEFAULT_CAPACITY: usize = 100;
7+
8+
/// A key for span deduplication consisting of `trace_id` and `span_id`.
9+
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
10+
pub struct DedupKey {
11+
pub trace_id: u64,
12+
pub span_id: u64,
13+
}
14+
15+
impl DedupKey {
16+
/// Creates a new `SpanDedupKey`.
17+
#[must_use]
18+
pub fn new(trace_id: u64, span_id: u64) -> Self {
19+
Self { trace_id, span_id }
20+
}
21+
}
22+
23+
/// A deduplicator that maintains a fixed-size cache of recently seen span keys.
24+
/// When the capacity is reached, the oldest key is evicted (FIFO).
25+
pub struct Deduper {
26+
/// Set for O(1) existence checks
27+
seen: HashSet<DedupKey>,
28+
/// Queue to maintain insertion order for FIFO eviction
29+
order: VecDeque<DedupKey>,
30+
/// Maximum number of keys to keep
31+
capacity: usize,
32+
}
33+
34+
impl Default for Deduper {
35+
fn default() -> Self {
36+
Self::new(DEFAULT_CAPACITY)
37+
}
38+
}
39+
40+
impl Deduper {
41+
/// Creates a new `Deduper` with the specified capacity.
42+
///
43+
/// # Arguments
44+
///
45+
/// * `capacity` - Maximum number of IDs to track
46+
///
47+
/// # Examples
48+
///
49+
/// ```
50+
/// use bottlecap::traces::span_dedup::Deduper;
51+
///
52+
/// let deduper = Deduper::new(100);
53+
/// ```
54+
#[must_use]
55+
pub fn new(capacity: usize) -> Self {
56+
Self {
57+
seen: HashSet::with_capacity(capacity),
58+
order: VecDeque::with_capacity(capacity),
59+
capacity,
60+
}
61+
}
62+
63+
/// Checks if a span key exists and adds it if it doesn't.
64+
///
65+
/// # Arguments
66+
///
67+
/// * `key` - The span key to check and potentially add
68+
///
69+
/// # Returns
70+
///
71+
/// `true` if the key was added (didn't exist before), `false` if it already existed
72+
///
73+
/// # Examples
74+
///
75+
/// ```
76+
/// use bottlecap::traces::span_dedup::{Deduper, DedupKey};
77+
///
78+
/// let mut deduper = Deduper::default();
79+
/// let key = DedupKey::new(123, 456);
80+
/// assert!(deduper.check_and_add(key)); // Returns true, key was added
81+
/// assert!(!deduper.check_and_add(key)); // Returns false, key already exists
82+
/// ```
83+
pub fn check_and_add(&mut self, key: DedupKey) -> bool {
84+
// If the key already exists, return false
85+
if self.seen.contains(&key) {
86+
return false;
87+
}
88+
89+
// If we're at capacity, evict the oldest entry
90+
if self.order.len() >= self.capacity {
91+
if let Some(oldest) = self.order.pop_front() {
92+
self.seen.remove(&oldest);
93+
}
94+
}
95+
96+
// Add the new key
97+
self.seen.insert(key);
98+
self.order.push_back(key);
99+
100+
// Return true to indicate the key was added
101+
true
102+
}
103+
}
104+
105+
#[cfg(test)]
106+
mod tests {
107+
use super::*;
108+
109+
#[test]
110+
fn test_new_deduper() {
111+
let deduper = Deduper::new(10);
112+
assert_eq!(deduper.seen.len(), 0);
113+
}
114+
115+
#[test]
116+
fn test_default_deduper() {
117+
let deduper = Deduper::default();
118+
assert_eq!(deduper.capacity, DEFAULT_CAPACITY);
119+
}
120+
121+
#[test]
122+
fn test_check_and_add() {
123+
let mut deduper = Deduper::new(3);
124+
125+
let key1 = DedupKey::new(100, 123);
126+
let key2 = DedupKey::new(100, 456);
127+
128+
// First call should return true (key was added)
129+
assert!(deduper.check_and_add(key1));
130+
131+
// Second call should return false (key already exists)
132+
assert!(!deduper.check_and_add(key1));
133+
134+
// Different key should return true
135+
assert!(deduper.check_and_add(key2));
136+
137+
// Calling again should return false
138+
assert!(!deduper.check_and_add(key2));
139+
}
140+
141+
#[test]
142+
fn test_check_and_add_with_eviction() {
143+
let mut deduper = Deduper::new(2);
144+
145+
let key1 = DedupKey::new(1, 10);
146+
let key2 = DedupKey::new(2, 20);
147+
let key3 = DedupKey::new(3, 30);
148+
149+
assert!(deduper.check_and_add(key1));
150+
assert!(deduper.check_and_add(key2));
151+
152+
// Adding 3rd should evict key1
153+
assert!(deduper.check_and_add(key3));
154+
155+
// Now key1 should be addable again (was evicted)
156+
assert!(deduper.check_and_add(key1));
157+
158+
// But key2 should have been evicted
159+
assert!(deduper.check_and_add(key2));
160+
}
161+
162+
#[test]
163+
fn test_same_trace_id_different_span_id() {
164+
let mut deduper = Deduper::new(3);
165+
166+
let key1 = DedupKey::new(100, 1);
167+
let key2 = DedupKey::new(100, 2);
168+
169+
// Both should be added even though they have the same trace_id
170+
assert!(deduper.check_and_add(key1));
171+
assert!(deduper.check_and_add(key2));
172+
}
173+
174+
#[test]
175+
fn test_same_span_id_different_trace_id() {
176+
let mut deduper = Deduper::new(3);
177+
178+
let key1 = DedupKey::new(1, 100);
179+
let key2 = DedupKey::new(2, 100);
180+
181+
// Both should be added even though they have the same span_id
182+
assert!(deduper.check_and_add(key1));
183+
assert!(deduper.check_and_add(key2));
184+
}
185+
}

0 commit comments

Comments
 (0)