-
Notifications
You must be signed in to change notification settings - Fork 41
Description
ThreadTag Conflicts Cause Data Contamination
Problem Description
When multiple ThreadTag rules with the same thread_id and key exist in the Ruleset, they are all applied to StackTrace metadata during profiling. However, during pprof encoding, a HashMap is used to store labels, which silently overwrites duplicate keys, causing profiling data from one span to be incorrectly labeled with another span's tag.
Root Cause
The issue stems from an inconsistency between data collection and encoding:
1. Data Collection (src/backend/ruleset.rs):
- Uses
HashSet<Rule>whereRuleis the complete tuple(thread_id, Tag{key, value}) - Allows multiple rules like:
ThreadTag(100, Tag{key:"span_id", value:"span_a"}) ThreadTag(100, Tag{key:"span_id", value:"span_b"})
- Both rules match the same thread_id and are applied to metadata
2. Pprof Encoding (src/encode/pprof.rs):
let mut labels = HashMap::new();
for l in &stacktrace.metadata.tags {
let k = b.add_string(&l.key);
let v = b.add_string(&l.value);
labels.insert(k, v); // ⚠️ Duplicate keys are silently overwritten
}
for l in &report.metadata.tags {
let k = b.add_string(&l.key);
let v = b.add_string(&l.value);
labels.insert(k, v); // ⚠️ Duplicate keys are silently overwritten
}The HashMap only retains the last value for each key, discarding earlier values without any warning. This is the fundamental issue: pprof format output does not allow duplicate keys with different values.
Steps to Reproduce
import pyroscope
import threading
pyroscope.configure(application_name="test", server_address="http://localhost:4040")
thread_id = threading.get_ident()
# Step 1: Add first tag
pyroscope.add_thread_tag(thread_id, "span_id", "span_a")
# ... profiling data collected with span_a ...
# Step 2: Forget to remove, or remove fails
# pyroscope.remove_thread_tag(thread_id, "span_id", "span_a") # Skipped
# Step 3: Add new tag
pyroscope.add_thread_tag(thread_id, "span_id", "span_b")
# ... more profiling data collected ...
# Result: Data from span_a period is labeled as span_b in pprof outputExpected Behavior
Each (thread_id, key) combination should have at most one value at any given time to match the pprof encoding constraints.
Proposed Solutions
Solution 1: Use HashMap for ThreadTags (Recommended)
Replace HashSet<Rule> with a structure that enforces uniqueness by (thread_id, key):
#[derive(Debug, Default, Clone)]
pub struct Ruleset {
pub global_tags: Arc<Mutex<HashSet<Tag>>>,
pub thread_tags: Arc<Mutex<HashMap<(u64, String), Tag>>>,
}Benefits:
- O(1) automatic deduplication - inherently prevents conflicts
- Natural key-value semantics - matches how tags are actually used
- Consistency with pprof encoding - aligns data structure with output format
Solution 2: Auto-remove Conflicts in add_rule
Modify add_rule to remove conflicting entries before insertion:
pub fn add_rule(&self, rule: Rule) -> Result<bool> {
let mut rules = self.rules.lock()?;
if let Rule::ThreadTag(thread_id, ref tag) = rule {
rules.retain(|existing_rule| {
match existing_rule {
Rule::ThreadTag(existing_thread_id, existing_tag) => {
existing_thread_id != &thread_id || existing_tag.key != tag.key
}
_ => true,
}
});
}
let insert = rules.insert(rule);
Ok(insert)
}Time complexity: O(n) where n is the number of rules.
Question for Maintainers
I'm curious about the original design decision: Why use HashSet<Rule> (where Rule contains the complete Tag) instead of a structure like HashMap<(u64, String), String> for thread tags?
Was there a specific use case that required allowing multiple values for the same (thread_id, key) combination? Understanding the rationale would help ensure any fix doesn't break intended functionality.
Impact
- Severity: High - Causes incorrect attribution of profiling data
- Affected scenarios: Thread reuse, OpenTelemetry integration, long-running threads with sequential operations
- Workaround: Manually call
remove_thread_tagbefore everyadd_thread_tag, but this is error-prone
Environment
- pyroscope-rs version: on tag 'python-0.8.11', but i think the logic is the same as the latest version
- Integration: OpenTelemetry SpanProcessor (Python FFI)
- Related files:
src/backend/ruleset.rssrc/encode/pprof.rspyroscope_ffi/python/lib/src/lib.rs