Skip to content

Comments

Add DD_DYNAMIC_INSTRUMENTATION_CAPTURE_TIMEOUT environment variable#5386

Open
p-datadog wants to merge 3 commits intomasterfrom
di-circuit-breaker-env
Open

Add DD_DYNAMIC_INSTRUMENTATION_CAPTURE_TIMEOUT environment variable#5386
p-datadog wants to merge 3 commits intomasterfrom
di-circuit-breaker-env

Conversation

@p-datadog
Copy link
Member

@p-datadog p-datadog commented Feb 20, 2026

What does this PR do?

Adds environment variable support for the Dynamic Instrumentation circuit breaker's max_processing_time configuration using the cross-language standard DD_DYNAMIC_INSTRUMENTATION_CAPTURE_TIMEOUT environment variable.

Motivation:

The circuit breaker's max_processing_time threshold was only configurable programmatically via c.dynamic_instrumentation.internal.max_processing_time, which created challenges for system testing:

  1. Cannot test circuit breaker behavior in system-tests - No way to set the threshold to 0 to force immediate probe disabling for testing
  2. Cannot disable circuit breaker for capture limit tests - Snapshot capture limit tests perform expensive serialization which can trigger the circuit breaker unintentionally, causing test failures
  3. Cannot configure threshold in containerized environments - System tests run tracers in Docker containers where programmatic configuration is not feasible

This PR adds environment variable support with:

  • Cross-language consistency: Uses DD_DYNAMIC_INSTRUMENTATION_CAPTURE_TIMEOUT (matching Java) for easy discoverability by multi-language teams
  • Millisecond precision: Environment variable accepts values in milliseconds (e.g., 200) and converts to seconds internally (0.2)
  • Optimized default: Lowers default from 500ms to 200ms, providing 100% overhead on typical 200ms Ruby requests (vs Java's 50% overhead with 100ms timeout)

Change log entry

N/A - This is an internal-only configuration change. The max_processing_time setting is under c.dynamic_instrumentation.internal.* namespace, which is explicitly documented as "for internal Datadog use only" in the configuration file. No customer-facing documentation or release notes entry is needed.

Additional Notes:

Cross-language alignment:

  • Java: DD_DYNAMIC_INSTRUMENTATION_CAPTURE_TIMEOUT=100 (100ms, version A, tracks snapshot capture only)
  • Ruby: DD_DYNAMIC_INSTRUMENTATION_CAPTURE_TIMEOUT=200 (200ms, version B, tracks full DI processing time: entry + exit)
  • Node: DD_DYNAMIC_INSTRUMENTATION_CAPTURE_TIMEOUT_MS=15 (15ms, different suffix pattern)

Ruby reuses Java's environment variable name for consistency, even though the implementation semantics differ slightly (Ruby tracks full processing time vs Java tracking just capture).

Configuration versioning:
Ruby uses version B in supported-configurations.json because the default value differs from Java's version A (200ms vs 100ms). Ruby, being a slower language, likely needs a higher timeout in practice to provide the same bounding behavior as Java. This cross-language versioning is documented in docs/AccessEnvironmentVariables.md.

Environment variable behavior:

  • Empty string ""nil (circuit breaker disabled)
  • Negative values (e.g., -1, -999) → nil (circuit breaker disabled)
  • "0"0.0 (trips immediately after first execution)
  • Positive values in milliseconds (e.g., "200") → converted to seconds (0.2)
  • Not set → 0.2 seconds (200ms default)

The custom env_parser handles:

  1. Converting milliseconds to seconds (user-friendly input)
  2. Treating empty string as nil instead of 0.0 (Ruby's default String#to_f behavior)
  3. Converting negative values to nil for convenient disabling

Default value rationale:

  • Previous default: 500ms (250% overhead on typical 200ms Ruby request)
  • New default: 200ms (100% overhead on typical 200ms Ruby request)
  • Based on web response time research showing typical Ruby apps respond in 200-400ms
  • More aggressive than before but still lenient compared to Java's 100ms (50% overhead)

How to test the change?

Unit tests were added covering:

  • Programmatic configuration (3 test cases)
  • Environment variable configuration (8 test cases)
  • Millisecond to second conversion
  • Special value handling (empty string, negative values, zero)

Related:

  • System-tests PR: system-tests#XXXX
  • Circuit breaker implementation: lib/datadog/di/instrumenter.rb:586-597
  • Settings documentation: lib/datadog/di/configuration/settings.rb:228-251

Unicorn Enterprises added 2 commits February 20, 2026 16:35
@p-datadog p-datadog requested a review from a team as a code owner February 20, 2026 21:50
@github-actions
Copy link

github-actions bot commented Feb 20, 2026

👋 Hey @p-datadog, please fill "Change log entry" section in the pull request description.

If changes need to be present in CHANGELOG.md you can state it this way

**Change log entry**

Yes. A brief summary to be placed into the CHANGELOG.md

(possible answers Yes/Yep/Yeah)

Or you can opt out like that

**Change log entry**

None.

(possible answers No/Nope/None)

Visited at: 2026-02-20 22:20:28 UTC

@pr-commenter
Copy link

pr-commenter bot commented Feb 20, 2026

Benchmarks

Benchmark execution time: 2026-02-20 22:49:46

Comparing candidate commit 502346c in PR branch di-circuit-breaker-env with baseline commit 91be22f in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 44 metrics, 2 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant