-
Notifications
You must be signed in to change notification settings - Fork 15
fix(tracer-flare): fix design of TracerFlareManager #1186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1186 +/- ##
==========================================
- Coverage 71.85% 71.76% -0.09%
==========================================
Files 356 356
Lines 56685 56680 -5
==========================================
- Hits 40731 40679 -52
- Misses 15954 16001 +47
🚀 New features to boost your workflow:
|
RemoteConfigData::TracerFlareConfig(agent_config) => { | ||
if agent_config.name.starts_with("flare-log-level.") { | ||
if let Some(log_level) = &agent_config.config.log_level { | ||
if let State::Collecting { log_level: _ } = tracer_flare.state { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what is supposed to happen in the SDK if a flare is already in progress when we receive an agent task to start a new one? We just pass it along, and with it pass along the responsibility to the SDK to know what to do. Shouldn't this crate encapsulate that logic?
If we receive two tasks in the same listener batch, we only return the first one. 👍
If we receive two tasks in different listener batches we will pass both to the caller and they either ignore it, or stop the current flare and start a new one. This opens the potential for inconsistent behavior between the SDKs. If we receive a task to start a flare while one is already in progress, why don't we just ignore it? That's what dd-trace-py does today.
I don't have an opinion on what the correct behavior should be, but I do think we should try to encapsulate whatever we can and put the minimum burden on the SDK to make decisions.
BenchmarksComparisonBenchmark execution time: 2025-10-07 21:12:56 Comparing candidate commit d1b051f in PR branch Found 2 performance improvements and 0 performance regressions! Performance is the same for 51 metrics, 2 unstable metrics. scenario:credit_card/is_card_number/378282246310005
CandidateCandidate benchmark detailsGroup 1
Group 2
Group 3
Group 4
Group 5
Group 6
Group 7
Group 8
Group 9
Group 10
Group 11
Group 12
Group 13
Group 14
Group 15
BaselineOmitted due to size. |
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
} | ||
|
||
#[cfg(test)] | ||
mod tests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is test coverage for multi-file scenarios? So the batch processing and priority logic code in check_remote_config_file
isn't being tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of testing this directly in the integration tests since it would need to setup a Listener (to use the function that handle the batch of RC) as I already did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits about docs and naming but LGTM
/// | ||
/// Key responsibilities: | ||
/// - Parsing remote configuration files to determine flare actions (Set log level, Send flare) | ||
/// - Storing agent task metadata needed for flare transmission |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this outdated since you send the task in the action now ?
/// managing the lifecycle of flare collection and transmission. It operates in two modes: | ||
/// | ||
/// - **Basic mode**: Stores agent URL and language configuration for flare operations | ||
/// - **Remote config mode**: Listens to remote configuration updates to automatically trigger flare |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I find the name "Remote config mode" a bit confusing as both mode rely on remote config maybe the "no-listener" and "listener" mode is clearer ?
|
||
/// Enum that hold the different log level possible | ||
#[derive(Debug, PartialEq)] | ||
/// Enum that holds the different log levels possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should explicit that we rely on Ord
to choose which level to keep and therefore should not change the order of the enum variants.
pub enum ReturnAction { | ||
/// If AGENT_CONFIG received with the right properties. | ||
Start(LogLevel), | ||
/// If AGENT_TASK received with the right properties. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More than "when" this action is returned you should explain "what" the tracer should do when receiving this action.
/// * `FlareError(msg)` - If something fail. | ||
pub fn handle_remote_config_file( | ||
file: RemoteConfigFile, | ||
tracer_flare: &mut TracerFlareManager, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not a method of the tracer flare manager ?
file: RemoteConfigFile, | ||
tracer_flare: &mut TracerFlareManager, | ||
) -> Result<ReturnAction, FlareError> { | ||
fn check_remote_config_file(file: RemoteConfigFile) -> Result<ReturnAction, FlareError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I find the names check_remote_config_file
and check_remote_config_file
a bit confusing maybe this could be parse_remote_config_file
or an Implementation of TryFrom<RemoteConfigFile>
on ReturnAction
if let ReturnAction::Set(_) = state { | ||
tracer_flare.collecting = true; | ||
} else if let ReturnAction::Send(_) = state { | ||
tracer_flare.collecting = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should Unset
also set collecting to false ?
} | ||
|
||
#[cfg(test)] | ||
mod tests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a test for priority
as it can also be used as reference for other implementations.
What does this PR do?
Fix the use of the
AGENT_CONFIG
product in the tracer flare design.Remove the
state
field.Added an argument
log_level
for thezip_and_send
function to let the language tell us what log level is on when creating the flare.We're now returning a list of action to perform from the function that listen the RC.
Motivation
What inspired you to submit this pull request?
Additional Notes
How to test the change?
Describe here in detail how the change can be validated.