Skip to content

Conversation

daniel-mohedano
Copy link
Contributor

@daniel-mohedano daniel-mohedano commented Jul 22, 2025

What Does This Do

Implements Test Optimization's Failed Test Replay using Live Debugger's Exception Replay. When the feature is enabled and a test is retried due to Auto Test Retries, Exception Replay's logic will create a probe for the exception thrown (in the case of the test probably an assertion error, but not limited to it). When the test is retried, the probe captures debugging information if the exception is encountered again, creating a snapshot of the variables. If the snapshot is captured, it is send as a log to Datadog. The following modifications were made to Exception Replay's original implementation:

  • Exception Replay is enabled if Failed Test Replay is enabled by the user.
    • This is done through the DebuggerConfigBridge. It handles deferred updates, so no order dependency on startup is introduced between the products that want to use Live Debugger's features.
    • The existing configuration update logic used with Remote Config now also goes through the same system.
  • DefaultExceptionDebugger was modified to support Failed Test Replay. If working on Failed Test Replay mode (if CiVisibility is enabled) it will:
    • Instrument Errors, which were previously ignored.
    • Ignore the max number of exception per second limit.
    • Ignore the exception capturing cooldown.
    • Apply the instrumentation synchronously. Failed test retries can be performed in rapid succession and the async approach to the instrumentation meant that most of the times the instrumentation was not performed before the next test failure. This has also been added as a separate configuration to support it in regular Exception Replay.
  • Adds a product field to snapshots, populated with test_optimization if Failed Test Replay was marked as active. This allows us to have the option of not billing customers for logs generated by the product.
  • Removed Live Debugger's dependency on Remote Config being enabled for its configuration to be initialized.
  • Exception Replay now supports Agentless mode. For now this is tied with CiVisibility agentless mode. If DD_CIVISIBILITY_AGENTLESS_ENABLED is set, Live Debugger's logic for Exception Replay will use the logs API instead of the agent's.
  • DebuggerSink now flushes on closing to avoid snapshots not being sent on test session finish.

Additional changes:

  • Refactored BackendApiFactory.Intake to a standalone Intake, given that it is useful in order to compute agentless mode URLs.
  • Updated libraries capabilities to add failed_test_replay in test frameworks that support Auto Test Retries.
  • Other changes related to adding di_enabled to the Settings response and telemetry.

Validation:

  • MavenSmokeTest now has an additional test for Failed Test Replay, validating the feature when build system instrumentation is present.
  • Implemented JUnitConsoleSmokeTest to validate the feature in headless mode. This test should ensure that the ordering dependency between CiVisibility's system and Live Debugger's is always accounted for.
  • Both smoke tests also validate:
    • Tests that do not have an Auto Test Retries execution strategy will not have probes installed.
    • Snapshot data is captured for all test retries and not limited to the first one.

Motivation

Test Optimization wants to improve the support for Failed Test Replay, implementing it in additional languages apart from JS.

Contributor Checklist

Jira ticket: SDTEST-2242

@daniel-mohedano daniel-mohedano added type: enhancement Enhancements and improvements tag: do not merge Do not merge changes comp: ci visibility Continuous Integration Visibility comp: debugger Dynamic Instrumentation labels Jul 22, 2025
@pr-commenter
Copy link

pr-commenter bot commented Jul 23, 2025

Debugger benchmarks

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
ci_job_date 1756298476 1756298821
end_time 2025-08-27T12:42:38 2025-08-27T12:48:23
git_branch master daniel.mohedano/failed-test-replay
git_commit_sha 8799a82 73bf81d
start_time 2025-08-27T12:41:17 2025-08-27T12:47:02
See matching parameters
Baseline Candidate
ci_job_id 1101030799 1101030799
ci_pipeline_id 74844364 74844364
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
git_commit_date 1756297828 1756297828

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 9 metrics, 6 unstable metrics.

See unchanged results
scenario Δ mean agg_http_req_duration_min Δ mean agg_http_req_duration_p50 Δ mean agg_http_req_duration_p75 Δ mean agg_http_req_duration_p99 Δ mean throughput
scenario:noprobe unstable
[-18.701µs; +33.719µs] or [-6.731%; +12.137%]
unstable
[-28.180µs; +47.274µs] or [-8.854%; +14.852%]
unstable
[-50096.731ns; +50059.591ns] or [-14.577%; +14.566%]
unstable
[-138.110µs; +129.907µs] or [-13.970%; +13.140%]
same
scenario:basic same same same unstable
[-102.868µs; +73.448µs] or [-13.752%; +9.819%]
unstable
[-209.640op/s; +67.392op/s] or [-7.757%; +2.494%]
scenario:loop same same same same same
Request duration reports for reports
gantt
    title reports - request duration [CI 0.99] : candidate=None, baseline=None
    dateFormat X
    axisFormat %s
section baseline
noprobe (318.291 µs) : 290, 346
.   : milestone, 318,
basic (277.609 µs) : 272, 284
.   : milestone, 278,
loop (8.962 ms) : 8956, 8967
.   : milestone, 8962,
section candidate
noprobe (327.838 µs) : 285, 370
.   : milestone, 328,
basic (279.083 µs) : 274, 285
.   : milestone, 279,
loop (8.965 ms) : 8961, 8970
.   : milestone, 8965,
Loading
  • baseline results
Scenario Request median duration [CI 0.99]
noprobe 318.291 µs [290.309 µs, 346.274 µs]
basic 277.609 µs [271.64 µs, 283.578 µs]
loop 8.962 ms [8.956 ms, 8.967 ms]
  • candidate results
Scenario Request median duration [CI 0.99]
noprobe 327.838 µs [285.468 µs, 370.209 µs]
basic 279.083 µs [273.529 µs, 284.638 µs]
loop 8.965 ms [8.961 ms, 8.97 ms]

@pr-commenter
Copy link

pr-commenter bot commented Jul 23, 2025

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master daniel.mohedano/failed-test-replay
git_commit_date 1756292489 1756297828
git_commit_sha 8799a82 73bf81d
release_version 1.53.0-SNAPSHOT~8799a82b0a 1.51.0-SNAPSHOT~73bf81d739
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1756299672 1756299672
ci_job_id 1101030786 1101030786
ci_pipeline_id 74844364 74844364
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-1cyke420 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-1cyke420 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module Agent Agent
parent None None

Summary

Found 0 performance improvements and 3 performance regressions! Performance is the same for 43 metrics, 13 unstable metrics.

scenario Δ mean execution_time candidate mean execution_time baseline mean execution_time
scenario:startup:insecure-bank:iast:Debugger worse
[+4.528ms; +4.992ms] or [+78.445%; +86.473%]
10.532ms 5.772ms
scenario:startup:insecure-bank:tracing:Debugger worse
[+4.335ms; +4.780ms] or [+71.423%; +78.750%]
10.628ms 6.070ms
scenario:startup:petclinic:tracing:Debugger worse
[+4.178ms; +4.410ms] or [+68.073%; +71.857%]
10.431ms 6.137ms
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.05 s) : 0, 1050188
Total [baseline] (8.606 s) : 0, 8605640
Agent [candidate] (1.064 s) : 0, 1064171
Total [candidate] (8.662 s) : 0, 8662366
section iast
Agent [baseline] (1.178 s) : 0, 1177621
Total [baseline] (9.293 s) : 0, 9292592
Agent [candidate] (1.186 s) : 0, 1186098
Total [candidate] (9.362 s) : 0, 9362160
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.05 s -
Agent iast 1.178 s 127.433 ms (12.1%)
Total tracing 8.606 s -
Total iast 9.293 s 686.951 ms (8.0%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.064 s -
Agent iast 1.186 s 121.927 ms (11.5%)
Total tracing 8.662 s -
Total iast 9.362 s 699.794 ms (8.1%)
gantt
    title insecure-bank - break down per module: candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.465 ms) : 0, 1465
crashtracking [candidate] (1.474 ms) : 0, 1474
BytebuddyAgent [baseline] (734.93 ms) : 0, 734930
BytebuddyAgent [candidate] (742.135 ms) : 0, 742135
GlobalTracer [baseline] (243.508 ms) : 0, 243508
GlobalTracer [candidate] (245.692 ms) : 0, 245692
AppSec [baseline] (30.252 ms) : 0, 30252
AppSec [candidate] (30.627 ms) : 0, 30627
Debugger [baseline] (6.07 ms) : 0, 6070
Debugger [candidate] (10.628 ms) : 0, 10628
Remote Config [baseline] (666.182 µs) : 0, 666
Remote Config [candidate] (665.419 µs) : 0, 665
Telemetry [baseline] (12.237 ms) : 0, 12237
Telemetry [candidate] (11.807 ms) : 0, 11807
section iast
crashtracking [baseline] (1.45 ms) : 0, 1450
crashtracking [candidate] (1.452 ms) : 0, 1452
BytebuddyAgent [baseline] (849.608 ms) : 0, 849608
BytebuddyAgent [candidate] (851.268 ms) : 0, 851268
GlobalTracer [baseline] (232.783 ms) : 0, 232783
GlobalTracer [candidate] (234.179 ms) : 0, 234179
IAST [baseline] (32.178 ms) : 0, 32178
IAST [candidate] (28.818 ms) : 0, 28818
AppSec [baseline] (26.021 ms) : 0, 26021
AppSec [candidate] (29.639 ms) : 0, 29639
Debugger [baseline] (5.772 ms) : 0, 5772
Debugger [candidate] (10.532 ms) : 0, 10532
Remote Config [baseline] (597.95 µs) : 0, 598
Remote Config [candidate] (617.479 µs) : 0, 617
Telemetry [baseline] (8.277 ms) : 0, 8277
Telemetry [candidate] (8.627 ms) : 0, 8627
Loading
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.058 s) : 0, 1057586
Total [baseline] (10.735 s) : 0, 10735405
Agent [candidate] (1.049 s) : 0, 1049315
Total [candidate] (10.702 s) : 0, 10701996
section appsec
Agent [baseline] (1.225 s) : 0, 1224843
Total [baseline] (10.757 s) : 0, 10757042
Agent [candidate] (1.223 s) : 0, 1222995
Total [candidate] (10.78 s) : 0, 10779911
section iast
Agent [baseline] (1.188 s) : 0, 1188404
Total [baseline] (10.929 s) : 0, 10929415
Agent [candidate] (1.186 s) : 0, 1185707
Total [candidate] (10.99 s) : 0, 10989619
section profiling
Agent [baseline] (1.211 s) : 0, 1210737
Total [baseline] (10.991 s) : 0, 10991045
Agent [candidate] (1.206 s) : 0, 1206037
Total [candidate] (10.823 s) : 0, 10822684
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.058 s -
Agent appsec 1.225 s 167.256 ms (15.8%)
Agent iast 1.188 s 130.818 ms (12.4%)
Agent profiling 1.211 s 153.15 ms (14.5%)
Total tracing 10.735 s -
Total appsec 10.757 s 21.637 ms (0.2%)
Total iast 10.929 s 194.01 ms (1.8%)
Total profiling 10.991 s 255.639 ms (2.4%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.049 s -
Agent appsec 1.223 s 173.68 ms (16.6%)
Agent iast 1.186 s 136.392 ms (13.0%)
Agent profiling 1.206 s 156.722 ms (14.9%)
Total tracing 10.702 s -
Total appsec 10.78 s 77.915 ms (0.7%)
Total iast 10.99 s 287.623 ms (2.7%)
Total profiling 10.823 s 120.688 ms (1.1%)
gantt
    title petclinic - break down per module: candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.494 ms) : 0, 1494
crashtracking [candidate] (1.444 ms) : 0, 1444
BytebuddyAgent [baseline] (738.638 ms) : 0, 738638
BytebuddyAgent [candidate] (730.375 ms) : 0, 730375
GlobalTracer [baseline] (244.622 ms) : 0, 244622
GlobalTracer [candidate] (241.67 ms) : 0, 241670
AppSec [baseline] (30.377 ms) : 0, 30377
AppSec [candidate] (29.904 ms) : 0, 29904
Debugger [baseline] (6.137 ms) : 0, 6137
Debugger [candidate] (10.431 ms) : 0, 10431
Remote Config [baseline] (679.583 µs) : 0, 680
Remote Config [candidate] (661.986 µs) : 0, 662
Telemetry [baseline] (14.409 ms) : 0, 14409
Telemetry [candidate] (13.823 ms) : 0, 13823
section appsec
crashtracking [baseline] (1.471 ms) : 0, 1471
crashtracking [candidate] (1.453 ms) : 0, 1453
BytebuddyAgent [baseline] (756.709 ms) : 0, 756709
BytebuddyAgent [candidate] (755.406 ms) : 0, 755406
GlobalTracer [baseline] (235.346 ms) : 0, 235346
GlobalTracer [candidate] (234.943 ms) : 0, 234943
AppSec [baseline] (169.262 ms) : 0, 169262
AppSec [candidate] (171.001 ms) : 0, 171001
Debugger [baseline] (7.387 ms) : 0, 7387
Debugger [candidate] (5.675 ms) : 0, 5675
Remote Config [baseline] (649.44 µs) : 0, 649
Remote Config [candidate] (626.672 µs) : 0, 627
Telemetry [baseline] (9.295 ms) : 0, 9295
Telemetry [candidate] (9.235 ms) : 0, 9235
IAST [baseline] (23.603 ms) : 0, 23603
IAST [candidate] (23.522 ms) : 0, 23522
section iast
crashtracking [baseline] (1.467 ms) : 0, 1467
crashtracking [candidate] (1.456 ms) : 0, 1456
BytebuddyAgent [baseline] (858.102 ms) : 0, 858102
BytebuddyAgent [candidate] (852.624 ms) : 0, 852624
GlobalTracer [baseline] (235.257 ms) : 0, 235257
GlobalTracer [candidate] (234.035 ms) : 0, 234035
AppSec [baseline] (25.346 ms) : 0, 25346
AppSec [candidate] (26.85 ms) : 0, 26850
Debugger [baseline] (6.662 ms) : 0, 6662
Debugger [candidate] (10.346 ms) : 0, 10346
Remote Config [baseline] (624.801 µs) : 0, 625
Remote Config [candidate] (610.509 µs) : 0, 611
Telemetry [baseline] (8.389 ms) : 0, 8389
Telemetry [candidate] (8.338 ms) : 0, 8338
IAST [baseline] (31.431 ms) : 0, 31431
IAST [candidate] (30.396 ms) : 0, 30396
section profiling
ProfilingAgent [baseline] (107.855 ms) : 0, 107855
ProfilingAgent [candidate] (108.147 ms) : 0, 108147
crashtracking [baseline] (1.466 ms) : 0, 1466
crashtracking [candidate] (1.466 ms) : 0, 1466
BytebuddyAgent [baseline] (772.66 ms) : 0, 772660
BytebuddyAgent [candidate] (768.607 ms) : 0, 768607
GlobalTracer [baseline] (224.528 ms) : 0, 224528
GlobalTracer [candidate] (223.615 ms) : 0, 223615
AppSec [baseline] (30.356 ms) : 0, 30356
AppSec [candidate] (30.223 ms) : 0, 30223
Debugger [baseline] (7.092 ms) : 0, 7092
Debugger [candidate] (7.65 ms) : 0, 7650
Remote Config [baseline] (722.503 µs) : 0, 723
Remote Config [candidate] (712.032 µs) : 0, 712
Telemetry [baseline] (15.689 ms) : 0, 15689
Telemetry [candidate] (15.5 ms) : 0, 15500
Profiling [baseline] (108.51 ms) : 0, 108510
Profiling [candidate] (108.811 ms) : 0, 108811
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master daniel.mohedano/failed-test-replay
git_commit_date 1756292489 1756297828
git_commit_sha 8799a82 73bf81d
release_version 1.53.0-SNAPSHOT~8799a82b0a 1.51.0-SNAPSHOT~73bf81d739
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1756299332 1756299332
ci_job_id 1101030788 1101030788
ci_pipeline_id 74844364 74844364
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-1-hslf38kn 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-1-hslf38kn 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 2 performance regressions! Performance is the same for 10 metrics, 12 unstable metrics.

scenario Δ mean http_req_duration Δ mean throughput candidate mean http_req_duration candidate mean throughput baseline mean http_req_duration baseline mean throughput
scenario:load:insecure-bank:profiling:high_load worse
[+714.933µs; +1030.308µs] or [+8.313%; +11.980%]
unstable
[-115.385op/s; +17.135op/s] or [-21.405%; +3.179%]
9.473ms 489.938op/s 8.600ms 539.062op/s
scenario:load:petclinic:profiling:high_load worse
[+1.352ms; +2.369ms] or [+2.776%; +4.866%]
unstable
[-10.744op/s; +3.719op/s] or [-11.180%; +3.869%]
50.540ms 92.588op/s 48.680ms 96.100op/s
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a
    dateFormat X
    axisFormat %s
section baseline
no_agent (4.357 ms) : 4308, 4406
.   : milestone, 4357,
iast (9.395 ms) : 9239, 9551
.   : milestone, 9395,
iast_FULL (14.029 ms) : 13751, 14308
.   : milestone, 14029,
iast_GLOBAL (10.842 ms) : 10652, 11032
.   : milestone, 10842,
profiling (8.6 ms) : 8461, 8740
.   : milestone, 8600,
tracing (7.704 ms) : 7592, 7815
.   : milestone, 7704,
section candidate
no_agent (4.284 ms) : 4236, 4332
.   : milestone, 4284,
iast (9.408 ms) : 9241, 9576
.   : milestone, 9408,
iast_FULL (14.245 ms) : 13966, 14524
.   : milestone, 14245,
iast_GLOBAL (10.56 ms) : 10373, 10747
.   : milestone, 10560,
profiling (9.473 ms) : 9320, 9626
.   : milestone, 9473,
tracing (7.928 ms) : 7805, 8050
.   : milestone, 7928,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 4.357 ms [4.308 ms, 4.406 ms] -
iast 9.395 ms [9.239 ms, 9.551 ms] 5.038 ms (115.6%)
iast_FULL 14.029 ms [13.751 ms, 14.308 ms] 9.672 ms (222.0%)
iast_GLOBAL 10.842 ms [10.652 ms, 11.032 ms] 6.485 ms (148.8%)
profiling 8.6 ms [8.461 ms, 8.74 ms] 4.243 ms (97.4%)
tracing 7.704 ms [7.592 ms, 7.815 ms] 3.347 ms (76.8%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 4.284 ms [4.236 ms, 4.332 ms] -
iast 9.408 ms [9.241 ms, 9.576 ms] 5.125 ms (119.6%)
iast_FULL 14.245 ms [13.966 ms, 14.524 ms] 9.962 ms (232.5%)
iast_GLOBAL 10.56 ms [10.373 ms, 10.747 ms] 6.276 ms (146.5%)
profiling 9.473 ms [9.32 ms, 9.626 ms] 5.189 ms (121.1%)
tracing 7.928 ms [7.805 ms, 8.05 ms] 3.644 ms (85.1%)
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a
    dateFormat X
    axisFormat %s
section baseline
no_agent (37.614 ms) : 37301, 37926
.   : milestone, 37614,
appsec (46.813 ms) : 46401, 47225
.   : milestone, 46813,
code_origins (45.922 ms) : 45511, 46333
.   : milestone, 45922,
iast (44.177 ms) : 43784, 44570
.   : milestone, 44177,
profiling (48.68 ms) : 48219, 49141
.   : milestone, 48680,
tracing (43.94 ms) : 43572, 44307
.   : milestone, 43940,
section candidate
no_agent (36.926 ms) : 36630, 37223
.   : milestone, 36926,
appsec (47.834 ms) : 47409, 48260
.   : milestone, 47834,
code_origins (45.743 ms) : 45362, 46124
.   : milestone, 45743,
iast (43.033 ms) : 42659, 43407
.   : milestone, 43033,
profiling (50.54 ms) : 50056, 51024
.   : milestone, 50540,
tracing (44.88 ms) : 44497, 45263
.   : milestone, 44880,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 37.614 ms [37.301 ms, 37.926 ms] -
appsec 46.813 ms [46.401 ms, 47.225 ms] 9.2 ms (24.5%)
code_origins 45.922 ms [45.511 ms, 46.333 ms] 8.308 ms (22.1%)
iast 44.177 ms [43.784 ms, 44.57 ms] 6.563 ms (17.4%)
profiling 48.68 ms [48.219 ms, 49.141 ms] 11.066 ms (29.4%)
tracing 43.94 ms [43.572 ms, 44.307 ms] 6.326 ms (16.8%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 36.926 ms [36.63 ms, 37.223 ms] -
appsec 47.834 ms [47.409 ms, 48.26 ms] 10.908 ms (29.5%)
code_origins 45.743 ms [45.362 ms, 46.124 ms] 8.817 ms (23.9%)
iast 43.033 ms [42.659 ms, 43.407 ms] 6.107 ms (16.5%)
profiling 50.54 ms [50.056 ms, 51.024 ms] 13.614 ms (36.9%)
tracing 44.88 ms [44.497 ms, 45.263 ms] 7.954 ms (21.5%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master daniel.mohedano/failed-test-replay
git_commit_date 1756292489 1756297828
git_commit_sha 8799a82 73bf81d
release_version 1.53.0-SNAPSHOT~8799a82b0a 1.51.0-SNAPSHOT~73bf81d739
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1756299886 1756299886
ci_job_id 1101030790 1101030790
ci_pipeline_id 74844364 74844364
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-qelu8213 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-qelu8213 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.574 s) : 15574000, 15574000
.   : milestone, 15574000,
appsec (14.75 s) : 14750000, 14750000
.   : milestone, 14750000,
iast (18.665 s) : 18665000, 18665000
.   : milestone, 18665000,
iast_GLOBAL (17.781 s) : 17781000, 17781000
.   : milestone, 17781000,
profiling (15.431 s) : 15431000, 15431000
.   : milestone, 15431000,
tracing (15.043 s) : 15043000, 15043000
.   : milestone, 15043000,
section candidate
no_agent (14.991 s) : 14991000, 14991000
.   : milestone, 14991000,
appsec (14.903 s) : 14903000, 14903000
.   : milestone, 14903000,
iast (18.503 s) : 18503000, 18503000
.   : milestone, 18503000,
iast_GLOBAL (18.19 s) : 18190000, 18190000
.   : milestone, 18190000,
profiling (15.769 s) : 15769000, 15769000
.   : milestone, 15769000,
tracing (14.926 s) : 14926000, 14926000
.   : milestone, 14926000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.574 s [15.574 s, 15.574 s] -
appsec 14.75 s [14.75 s, 14.75 s] -824.0 ms (-5.3%)
iast 18.665 s [18.665 s, 18.665 s] 3.091 s (19.8%)
iast_GLOBAL 17.781 s [17.781 s, 17.781 s] 2.207 s (14.2%)
profiling 15.431 s [15.431 s, 15.431 s] -143.0 ms (-0.9%)
tracing 15.043 s [15.043 s, 15.043 s] -531.0 ms (-3.4%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 14.991 s [14.991 s, 14.991 s] -
appsec 14.903 s [14.903 s, 14.903 s] -88.0 ms (-0.6%)
iast 18.503 s [18.503 s, 18.503 s] 3.512 s (23.4%)
iast_GLOBAL 18.19 s [18.19 s, 18.19 s] 3.199 s (21.3%)
profiling 15.769 s [15.769 s, 15.769 s] 778.0 ms (5.2%)
tracing 14.926 s [14.926 s, 14.926 s] -65.0 ms (-0.4%)
Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.468 ms) : 1456, 1479
.   : milestone, 1468,
appsec (3.595 ms) : 3382, 3809
.   : milestone, 3595,
iast (2.209 ms) : 2145, 2273
.   : milestone, 2209,
iast_GLOBAL (2.243 ms) : 2179, 2307
.   : milestone, 2243,
profiling (2.053 ms) : 2001, 2104
.   : milestone, 2053,
tracing (2.02 ms) : 1970, 2069
.   : milestone, 2020,
section candidate
no_agent (1.473 ms) : 1461, 1485
.   : milestone, 1473,
appsec (3.649 ms) : 3432, 3867
.   : milestone, 3649,
iast (2.206 ms) : 2142, 2270
.   : milestone, 2206,
iast_GLOBAL (2.242 ms) : 2178, 2305
.   : milestone, 2242,
profiling (2.048 ms) : 1997, 2100
.   : milestone, 2048,
tracing (2.009 ms) : 1960, 2058
.   : milestone, 2009,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.468 ms [1.456 ms, 1.479 ms] -
appsec 3.595 ms [3.382 ms, 3.809 ms] 2.127 ms (144.9%)
iast 2.209 ms [2.145 ms, 2.273 ms] 741.725 µs (50.5%)
iast_GLOBAL 2.243 ms [2.179 ms, 2.307 ms] 775.032 µs (52.8%)
profiling 2.053 ms [2.001 ms, 2.104 ms] 584.922 µs (39.9%)
tracing 2.02 ms [1.97 ms, 2.069 ms] 551.817 µs (37.6%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.473 ms [1.461 ms, 1.485 ms] -
appsec 3.649 ms [3.432 ms, 3.867 ms] 2.176 ms (147.7%)
iast 2.206 ms [2.142 ms, 2.27 ms] 732.709 µs (49.7%)
iast_GLOBAL 2.242 ms [2.178 ms, 2.305 ms] 768.563 µs (52.2%)
profiling 2.048 ms [1.997 ms, 2.1 ms] 575.477 µs (39.1%)
tracing 2.009 ms [1.96 ms, 2.058 ms] 536.029 µs (36.4%)

Copy link

datadog-official bot commented Aug 11, 2025

🎯 Code Coverage
Patch Coverage: 45.58%
Total Coverage: 57.59% (+0.01%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 73bf81d | Docs | Was this helpful? Give us feedback!

@daniel-mohedano daniel-mohedano changed the title Failed Test Replay Implement Failed Test Replay Aug 12, 2025
@daniel-mohedano daniel-mohedano removed the tag: do not merge Do not merge changes label Aug 13, 2025
@daniel-mohedano daniel-mohedano marked this pull request as ready for review August 20, 2025 10:07
@daniel-mohedano daniel-mohedano requested review from a team as code owners August 20, 2025 10:07
@daniel-mohedano daniel-mohedano requested review from shatzi, Mariovido, bric3 and PerfectSlayer and removed request for a team August 20, 2025 10:07
return getFinalDebuggerBaseUrl() + "/debugger/v1/input";
if (Strings.isNotBlank(dynamicInstrumentationSnapshotUrl)) {
return dynamicInstrumentationSnapshotUrl;
} else if (isCiVisibilityFailedTestReplayActive() && isCiVisibilityAgentlessEnabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this condition should be isCiVisibilityAgentlessEnabled alone: if we're running in agentless mode chances are there's no agent to connect to. I understand that debugger wasn't working in agentless before these changes, but why not fix it while we're at it

@@ -1044,6 +1051,7 @@ public static String getHostName() {
private final boolean DBMTracePreparedStatements;

private final boolean dynamicInstrumentationEnabled;
private final String dynamicInstrumentationSnapshotUrl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed for our testing or for customer set ups where a custom URL (a proxy?) needs to be used to communicate with DD?

Copy link
Contributor Author

@daniel-mohedano daniel-mohedano Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was taken from #9186. We don't need it for our own testing given that we use CIVISIBILITY_AGENTLESS_URL, but it will be useful to let customers use a custom URL for the Exception Replay feature in general.

@@ -1029,6 +1034,8 @@ public static String getHostName() {
private final String gitPullRequestBaseBranch;
private final String gitPullRequestBaseBranchSha;
private final String gitCommitHeadSha;
private final boolean ciVisibilityFailedTestReplayEnabled;
private boolean ciVisibilityFailedTestReplayActive = false; // propagates setting to DI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be volatile if we choose to keep it :)

Having a mutable field in config doesn't quite fit how it is currently used: all the other fields are immutable (with the exception of the two that are just lazily initialized).
Also, having "test replay enabled" and "test replay active" next to each other is quite confusing.

I wonder if instead of setting this field we could harness datadog.trace.bootstrap.debugger.DebuggerContext#updateConfig.

An even better way would be to not call the debugger API directly, but try to invoke the remote config mechanism. This should be more robust: we avoid coupling Test Optimization to Debugger, and we make use of the centralized config updates logic that should (hopefully) take care of all the pitfalls for us

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a way to call datadog.remoteconfig.state.ProductState#apply programmatically (seems like it's only being called by the remote config poller when we receive the config from the backend), but I don't think adding it is impossible. We can discuss this with the core tracer team

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my conversations with Live Debugger team, they wanted to move away from having their products only available when remote config is enabled, which is why originally I didn't take it into account. But we could technically limit FTR to be used only when remote config is enabled (and actually it is only needed in headless mode). Let's discuss the approach 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, let's see if we can separate "dynamic config" from "remote config" and add some means of programmatically controlling the former

if (t instanceof Error) {
if (LOGGER.isDebugEnabled()) {
LOGGER.debug("Skip handling error: {}", t.toString());
if (isFailedTestReplayActive) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can get away without propagating this flag, and just check the test strategy

@@ -181,6 +183,10 @@ public void onTestStart(
}

if (testExecutionHistory != null) {
if (testExecutionHistory instanceof RetryUntilSuccessful) {
// Used by FailedTestReplay to limit the instrumentation to AutoTestRetries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how correct it is to be checking this on a per-test basis: if ATR is enabled in the backend, then every test is subject to auto-retry with the exception of attempt-to-fixes and new tests (as respective execution policies have higher priority than ATR). But do we really not want to enable exception replay for these two as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion I think it might be beneficial for all retry mechanisms. Attempt to fix could be a bit more dangerous with its 20 retries regarding the overhead FTR could introduce, but given that right now we only support the manual flow, it shouldn't be too big of an issue. I made the changes to limit it to ATR after the Guild meeting in order to align it with JS' implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, let's add a dedicated method to TestExecutionPolicy that'll determine whether FTR is enabled for a given test

@@ -181,6 +183,10 @@ public void onTestStart(
}

if (testExecutionHistory != null) {
if (testExecutionHistory instanceof RetryUntilSuccessful) {
// Used by FailedTestReplay to limit the instrumentation to AutoTestRetries
test.setTag(DDTags.TEST_STRATEGY, RetryReason.atr.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use datadog.trace.civisibility.domain.TestImpl#context to store data that you don't want to send to the backend. It is more idiomatic than adding/removing tags. As a nice side effect, the context is propagated to children spans, so if a test makes an HTTP request and the exception happens inside the child HTTP span, the context will be there as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we call testExecutionHistory.currentExecutionRetryReason() and just store the result of that? Doing instanceof is breaking encapsulation.

Retry reason will be null for the initial test run, but IIUC we don't apply exception replay to the first run anyway, right?

Copy link
Contributor Author

@daniel-mohedano daniel-mohedano Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we also need the information during the first execution in order to create the Exception Replay probe. The flow is:

  1. First run fails, probe is created and exception instrumented
  2. Test is retried, fails again, the context information from the probe is captured and sent.

Maybe a testExecutionHistory.retryReason() could have been a better approach to avoid the breaking of encapsulation. But if we're able to propagate this information either through the test context or by accessing the test strategy I agree it is a much cleaner approach.

@@ -632,6 +632,7 @@ public void execute() {
}

maybeStartAppSec(scoClass, sco);
// start civisibility before debugger to enable Failed Test Replay correctly in headless mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we manage to plug into remote config as described in the other comment, there should be no ordering dependency 🤞

@@ -180,6 +181,18 @@ private Map<String, String> getPropertiesPropagatedToChildProcess(
Strings.propertyNameToSystemPropertyName(CiVisibilityConfig.TEST_MANAGEMENT_ENABLED),
Boolean.toString(executionSettings.getTestManagementSettings().isEnabled()));

propagatedSystemProperties.put(
Strings.propertyNameToSystemPropertyName(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a dedicated setting for this? I wonder if it can be derived from DebuggerConfig.EXCEPTION_REPLAY_ENABLED && CI_VISIBILITY_ENABLED

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could cause problems because we use the ..._ENABLED settings as kill switches in datadog.trace.civisibility.config.ExecutionSettingsFactoryImpl#doCreate. So, although it would work when propagating the settings to the child process, in the parent process EXCEPTION_REPLAY_ENABLED==false and CI_VISIBILITY_ENABLED==true would mean that FTR wouldn't be enabled (even if it was marked as enabled by the backend)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline let's enable TO-specific debugger behaviour whenever exception replay is enabled and test optimization is enabled

@@ -108,7 +139,11 @@ public void handleException(Throwable t, AgentSpan span) {
exceptionProbeManager.createProbesForException(
throwable.getStackTrace(), chainedExceptionIdx);
if (creationResult.probesCreated > 0) {
AgentTaskScheduler.INSTANCE.execute(() -> applyExceptionConfiguration(fingerprint));
if (isFailedTestReplayActive || !applyConfigAsync) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of applyConfigAsync and the corresponding config field/methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The applyConfigAsync config was also added from #9186, to let Exception Replay apply the instrumentation synchronously even without Failed Test Replay

@Override
public void beforeSuiteEnd() {
LOGGER.debug("CiVisibility BeforeSuiteEnd fired, flushing sink");
sink.lowRateFlush(sink);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think debugger is already doing this asynchronously at a scheduled interval.
If I'm not mistaken, we just need to make sure whatever's left in the sink is flushed in com.datadog.debugger.sink.DebuggerSink#stop that is called from the shutdown hook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp: ci visibility Continuous Integration Visibility comp: debugger Dynamic Instrumentation type: enhancement Enhancements and improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants