Implement Failed Test Replay #9214

daniel-mohedano · 2025-07-22T14:22:37Z

What Does This Do

Implements Test Optimization's Failed Test Replay using Live Debugger's Exception Replay. When the feature is enabled and a test is retried due to Auto Test Retries, Exception Replay's logic will create a probe for the exception thrown (in the case of the test probably an assertion error, but not limited to it). When the test is retried, the probe captures debugging information if the exception is encountered again, creating a snapshot of the variables. If the snapshot is captured, it is send as a log to Datadog. The following modifications were made to Exception Replay's original implementation:

Exception Replay is enabled if Failed Test Replay is enabled by the user.
- This is done through the DebuggerConfigBridge. It handles deferred updates, so no order dependency on startup is introduced between the products that want to use Live Debugger's features.
- The existing configuration update logic used with Remote Config now also goes through the same system.
DefaultExceptionDebugger was modified to support Failed Test Replay. If working on Failed Test Replay mode (if CiVisibility is enabled) it will:
- Instrument Errors, which were previously ignored.
- Ignore the max number of exception per second limit.
- Ignore the exception capturing cooldown.
- Apply the instrumentation synchronously. Failed test retries can be performed in rapid succession and the async approach to the instrumentation meant that most of the times the instrumentation was not performed before the next test failure. This has also been added as a separate configuration to support it in regular Exception Replay.
Adds a product field to snapshots, populated with test_optimization if Failed Test Replay was marked as active. This allows us to have the option of not billing customers for logs generated by the product.
Removed Live Debugger's dependency on Remote Config being enabled for its configuration to be initialized.
Exception Replay now supports Agentless mode. For now this is tied with CiVisibility agentless mode. If DD_CIVISIBILITY_AGENTLESS_ENABLED is set, Live Debugger's logic for Exception Replay will use the logs API instead of the agent's.
DebuggerSink now flushes on closing to avoid snapshots not being sent on test session finish.

Additional changes:

Refactored BackendApiFactory.Intake to a standalone Intake, given that it is useful in order to compute agentless mode URLs.
Updated libraries capabilities to add failed_test_replay in test frameworks that support Auto Test Retries.
Other changes related to adding di_enabled to the Settings response and telemetry.

Validation:

MavenSmokeTest now has an additional test for Failed Test Replay, validating the feature when build system instrumentation is present.
Implemented JUnitConsoleSmokeTest to validate the feature in headless mode. This test should ensure that the ordering dependency between CiVisibility's system and Live Debugger's is always accounted for.
Both smoke tests also validate:
- Tests that do not have an Auto Test Retries execution strategy will not have probes installed.
- Snapshot data is captured for all test retries and not limited to the first one.

Motivation

Test Optimization wants to improve the support for Failed Test Replay, implementing it in additional languages apart from JS.

Contributor Checklist

Format the title according the contribution guidelines
Assign the type: and (comp: or inst:) labels in addition to any usefull labels
Don't use close, fix or any linking keywords when referencing an issue.
Use solves instead, and assign the PR milestone to the issue
Update the CODEOWNERS file on source file addition, move, or deletion
Update the public documentation in case of new configuration flag or behavior

Jira ticket: SDTEST-2242

pr-commenter · 2025-07-23T14:41:30Z

Debugger benchmarks

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
ci_job_date	1756298476	1756298821
end_time	2025-08-27T12:42:38	2025-08-27T12:48:23
git_branch	master	daniel.mohedano/failed-test-replay
git_commit_sha	`8799a82`	`73bf81d`
start_time	2025-08-27T12:41:17	2025-08-27T12:47:02

See matching parameters

	Baseline	Candidate
ci_job_id	1101030799	1101030799
ci_pipeline_id	74844364	74844364
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
git_commit_date	1756297828	1756297828

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 9 metrics, 6 unstable metrics.

See unchanged results

scenario	Δ mean agg_http_req_duration_min	Δ mean agg_http_req_duration_p50	Δ mean agg_http_req_duration_p75	Δ mean agg_http_req_duration_p99	Δ mean throughput
scenario:noprobe	unstable [-18.701µs; +33.719µs] or [-6.731%; +12.137%]	unstable [-28.180µs; +47.274µs] or [-8.854%; +14.852%]	unstable [-50096.731ns; +50059.591ns] or [-14.577%; +14.566%]	unstable [-138.110µs; +129.907µs] or [-13.970%; +13.140%]	same
scenario:basic	same	same	same	unstable [-102.868µs; +73.448µs] or [-13.752%; +9.819%]	unstable [-209.640op/s; +67.392op/s] or [-7.757%; +2.494%]
scenario:loop	same	same	same	same	same

Request duration reports for reports

gantt
    title reports - request duration [CI 0.99] : candidate=None, baseline=None
    dateFormat X
    axisFormat %s
section baseline
noprobe (318.291 µs) : 290, 346
.   : milestone, 318,
basic (277.609 µs) : 272, 284
.   : milestone, 278,
loop (8.962 ms) : 8956, 8967
.   : milestone, 8962,
section candidate
noprobe (327.838 µs) : 285, 370
.   : milestone, 328,
basic (279.083 µs) : 274, 285
.   : milestone, 279,
loop (8.965 ms) : 8961, 8970
.   : milestone, 8965,

baseline results

Scenario	Request median duration [CI 0.99]
noprobe	318.291 µs [290.309 µs, 346.274 µs]
basic	277.609 µs [271.64 µs, 283.578 µs]
loop	8.962 ms [8.956 ms, 8.967 ms]

candidate results

Scenario	Request median duration [CI 0.99]
noprobe	327.838 µs [285.468 µs, 370.209 µs]
basic	279.083 µs [273.529 µs, 284.638 µs]
loop	8.965 ms [8.961 ms, 8.97 ms]

pr-commenter · 2025-07-23T14:55:48Z

Benchmarks

Startup

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	daniel.mohedano/failed-test-replay
git_commit_date	1756292489	1756297828
git_commit_sha	`8799a82`	`73bf81d`
release_version	1.53.0-SNAPSHOT~8799a82b0a	1.51.0-SNAPSHOT~73bf81d739

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1756299672	1756299672
ci_job_id	1101030786	1101030786
ci_pipeline_id	74844364	74844364
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-0-1cyke420 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-0-1cyke420 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module	Agent	Agent
parent	None	None

Summary

Found 0 performance improvements and 3 performance regressions! Performance is the same for 43 metrics, 13 unstable metrics.

scenario	Δ mean execution_time	candidate mean execution_time	baseline mean execution_time
scenario:startup:insecure-bank:iast:Debugger	worse [+4.528ms; +4.992ms] or [+78.445%; +86.473%]	10.532ms	5.772ms
scenario:startup:insecure-bank:tracing:Debugger	worse [+4.335ms; +4.780ms] or [+71.423%; +78.750%]	10.628ms	6.070ms
scenario:startup:petclinic:tracing:Debugger	worse [+4.178ms; +4.410ms] or [+68.073%; +71.857%]	10.431ms	6.137ms

Startup time reports for insecure-bank

gantt
    title insecure-bank - global startup overhead: candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.05 s) : 0, 1050188
Total [baseline] (8.606 s) : 0, 8605640
Agent [candidate] (1.064 s) : 0, 1064171
Total [candidate] (8.662 s) : 0, 8662366
section iast
Agent [baseline] (1.178 s) : 0, 1177621
Total [baseline] (9.293 s) : 0, 9292592
Agent [candidate] (1.186 s) : 0, 1186098
Total [candidate] (9.362 s) : 0, 9362160

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.05 s	-
Agent	iast	1.178 s	127.433 ms (12.1%)
Total	tracing	8.606 s	-
Total	iast	9.293 s	686.951 ms (8.0%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.064 s	-
Agent	iast	1.186 s	121.927 ms (11.5%)
Total	tracing	8.662 s	-
Total	iast	9.362 s	699.794 ms (8.1%)

gantt
    title insecure-bank - break down per module: candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.465 ms) : 0, 1465
crashtracking [candidate] (1.474 ms) : 0, 1474
BytebuddyAgent [baseline] (734.93 ms) : 0, 734930
BytebuddyAgent [candidate] (742.135 ms) : 0, 742135
GlobalTracer [baseline] (243.508 ms) : 0, 243508
GlobalTracer [candidate] (245.692 ms) : 0, 245692
AppSec [baseline] (30.252 ms) : 0, 30252
AppSec [candidate] (30.627 ms) : 0, 30627
Debugger [baseline] (6.07 ms) : 0, 6070
Debugger [candidate] (10.628 ms) : 0, 10628
Remote Config [baseline] (666.182 µs) : 0, 666
Remote Config [candidate] (665.419 µs) : 0, 665
Telemetry [baseline] (12.237 ms) : 0, 12237
Telemetry [candidate] (11.807 ms) : 0, 11807
section iast
crashtracking [baseline] (1.45 ms) : 0, 1450
crashtracking [candidate] (1.452 ms) : 0, 1452
BytebuddyAgent [baseline] (849.608 ms) : 0, 849608
BytebuddyAgent [candidate] (851.268 ms) : 0, 851268
GlobalTracer [baseline] (232.783 ms) : 0, 232783
GlobalTracer [candidate] (234.179 ms) : 0, 234179
IAST [baseline] (32.178 ms) : 0, 32178
IAST [candidate] (28.818 ms) : 0, 28818
AppSec [baseline] (26.021 ms) : 0, 26021
AppSec [candidate] (29.639 ms) : 0, 29639
Debugger [baseline] (5.772 ms) : 0, 5772
Debugger [candidate] (10.532 ms) : 0, 10532
Remote Config [baseline] (597.95 µs) : 0, 598
Remote Config [candidate] (617.479 µs) : 0, 617
Telemetry [baseline] (8.277 ms) : 0, 8277
Telemetry [candidate] (8.627 ms) : 0, 8627

Startup time reports for petclinic

gantt
    title petclinic - global startup overhead: candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.058 s) : 0, 1057586
Total [baseline] (10.735 s) : 0, 10735405
Agent [candidate] (1.049 s) : 0, 1049315
Total [candidate] (10.702 s) : 0, 10701996
section appsec
Agent [baseline] (1.225 s) : 0, 1224843
Total [baseline] (10.757 s) : 0, 10757042
Agent [candidate] (1.223 s) : 0, 1222995
Total [candidate] (10.78 s) : 0, 10779911
section iast
Agent [baseline] (1.188 s) : 0, 1188404
Total [baseline] (10.929 s) : 0, 10929415
Agent [candidate] (1.186 s) : 0, 1185707
Total [candidate] (10.99 s) : 0, 10989619
section profiling
Agent [baseline] (1.211 s) : 0, 1210737
Total [baseline] (10.991 s) : 0, 10991045
Agent [candidate] (1.206 s) : 0, 1206037
Total [candidate] (10.823 s) : 0, 10822684

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.058 s	-
Agent	appsec	1.225 s	167.256 ms (15.8%)
Agent	iast	1.188 s	130.818 ms (12.4%)
Agent	profiling	1.211 s	153.15 ms (14.5%)
Total	tracing	10.735 s	-
Total	appsec	10.757 s	21.637 ms (0.2%)
Total	iast	10.929 s	194.01 ms (1.8%)
Total	profiling	10.991 s	255.639 ms (2.4%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.049 s	-
Agent	appsec	1.223 s	173.68 ms (16.6%)
Agent	iast	1.186 s	136.392 ms (13.0%)
Agent	profiling	1.206 s	156.722 ms (14.9%)
Total	tracing	10.702 s	-
Total	appsec	10.78 s	77.915 ms (0.7%)
Total	iast	10.99 s	287.623 ms (2.7%)
Total	profiling	10.823 s	120.688 ms (1.1%)

gantt
    title petclinic - break down per module: candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.494 ms) : 0, 1494
crashtracking [candidate] (1.444 ms) : 0, 1444
BytebuddyAgent [baseline] (738.638 ms) : 0, 738638
BytebuddyAgent [candidate] (730.375 ms) : 0, 730375
GlobalTracer [baseline] (244.622 ms) : 0, 244622
GlobalTracer [candidate] (241.67 ms) : 0, 241670
AppSec [baseline] (30.377 ms) : 0, 30377
AppSec [candidate] (29.904 ms) : 0, 29904
Debugger [baseline] (6.137 ms) : 0, 6137
Debugger [candidate] (10.431 ms) : 0, 10431
Remote Config [baseline] (679.583 µs) : 0, 680
Remote Config [candidate] (661.986 µs) : 0, 662
Telemetry [baseline] (14.409 ms) : 0, 14409
Telemetry [candidate] (13.823 ms) : 0, 13823
section appsec
crashtracking [baseline] (1.471 ms) : 0, 1471
crashtracking [candidate] (1.453 ms) : 0, 1453
BytebuddyAgent [baseline] (756.709 ms) : 0, 756709
BytebuddyAgent [candidate] (755.406 ms) : 0, 755406
GlobalTracer [baseline] (235.346 ms) : 0, 235346
GlobalTracer [candidate] (234.943 ms) : 0, 234943
AppSec [baseline] (169.262 ms) : 0, 169262
AppSec [candidate] (171.001 ms) : 0, 171001
Debugger [baseline] (7.387 ms) : 0, 7387
Debugger [candidate] (5.675 ms) : 0, 5675
Remote Config [baseline] (649.44 µs) : 0, 649
Remote Config [candidate] (626.672 µs) : 0, 627
Telemetry [baseline] (9.295 ms) : 0, 9295
Telemetry [candidate] (9.235 ms) : 0, 9235
IAST [baseline] (23.603 ms) : 0, 23603
IAST [candidate] (23.522 ms) : 0, 23522
section iast
crashtracking [baseline] (1.467 ms) : 0, 1467
crashtracking [candidate] (1.456 ms) : 0, 1456
BytebuddyAgent [baseline] (858.102 ms) : 0, 858102
BytebuddyAgent [candidate] (852.624 ms) : 0, 852624
GlobalTracer [baseline] (235.257 ms) : 0, 235257
GlobalTracer [candidate] (234.035 ms) : 0, 234035
AppSec [baseline] (25.346 ms) : 0, 25346
AppSec [candidate] (26.85 ms) : 0, 26850
Debugger [baseline] (6.662 ms) : 0, 6662
Debugger [candidate] (10.346 ms) : 0, 10346
Remote Config [baseline] (624.801 µs) : 0, 625
Remote Config [candidate] (610.509 µs) : 0, 611
Telemetry [baseline] (8.389 ms) : 0, 8389
Telemetry [candidate] (8.338 ms) : 0, 8338
IAST [baseline] (31.431 ms) : 0, 31431
IAST [candidate] (30.396 ms) : 0, 30396
section profiling
ProfilingAgent [baseline] (107.855 ms) : 0, 107855
ProfilingAgent [candidate] (108.147 ms) : 0, 108147
crashtracking [baseline] (1.466 ms) : 0, 1466
crashtracking [candidate] (1.466 ms) : 0, 1466
BytebuddyAgent [baseline] (772.66 ms) : 0, 772660
BytebuddyAgent [candidate] (768.607 ms) : 0, 768607
GlobalTracer [baseline] (224.528 ms) : 0, 224528
GlobalTracer [candidate] (223.615 ms) : 0, 223615
AppSec [baseline] (30.356 ms) : 0, 30356
AppSec [candidate] (30.223 ms) : 0, 30223
Debugger [baseline] (7.092 ms) : 0, 7092
Debugger [candidate] (7.65 ms) : 0, 7650
Remote Config [baseline] (722.503 µs) : 0, 723
Remote Config [candidate] (712.032 µs) : 0, 712
Telemetry [baseline] (15.689 ms) : 0, 15689
Telemetry [candidate] (15.5 ms) : 0, 15500
Profiling [baseline] (108.51 ms) : 0, 108510
Profiling [candidate] (108.811 ms) : 0, 108811

Load

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	daniel.mohedano/failed-test-replay
git_commit_date	1756292489	1756297828
git_commit_sha	`8799a82`	`73bf81d`
release_version	1.53.0-SNAPSHOT~8799a82b0a	1.51.0-SNAPSHOT~73bf81d739

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1756299332	1756299332
ci_job_id	1101030788	1101030788
ci_pipeline_id	74844364	74844364
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-1-hslf38kn 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-1-hslf38kn 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 2 performance regressions! Performance is the same for 10 metrics, 12 unstable metrics.

scenario	Δ mean http_req_duration	Δ mean throughput	candidate mean http_req_duration	candidate mean throughput	baseline mean http_req_duration	baseline mean throughput
scenario:load:insecure-bank:profiling:high_load	worse [+714.933µs; +1030.308µs] or [+8.313%; +11.980%]	unstable [-115.385op/s; +17.135op/s] or [-21.405%; +3.179%]	9.473ms	489.938op/s	8.600ms	539.062op/s
scenario:load:petclinic:profiling:high_load	worse [+1.352ms; +2.369ms] or [+2.776%; +4.866%]	unstable [-10.744op/s; +3.719op/s] or [-11.180%; +3.869%]	50.540ms	92.588op/s	48.680ms	96.100op/s

Request duration reports for insecure-bank

gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a
    dateFormat X
    axisFormat %s
section baseline
no_agent (4.357 ms) : 4308, 4406
.   : milestone, 4357,
iast (9.395 ms) : 9239, 9551
.   : milestone, 9395,
iast_FULL (14.029 ms) : 13751, 14308
.   : milestone, 14029,
iast_GLOBAL (10.842 ms) : 10652, 11032
.   : milestone, 10842,
profiling (8.6 ms) : 8461, 8740
.   : milestone, 8600,
tracing (7.704 ms) : 7592, 7815
.   : milestone, 7704,
section candidate
no_agent (4.284 ms) : 4236, 4332
.   : milestone, 4284,
iast (9.408 ms) : 9241, 9576
.   : milestone, 9408,
iast_FULL (14.245 ms) : 13966, 14524
.   : milestone, 14245,
iast_GLOBAL (10.56 ms) : 10373, 10747
.   : milestone, 10560,
profiling (9.473 ms) : 9320, 9626
.   : milestone, 9473,
tracing (7.928 ms) : 7805, 8050
.   : milestone, 7928,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	4.357 ms [4.308 ms, 4.406 ms]	-
iast	9.395 ms [9.239 ms, 9.551 ms]	5.038 ms (115.6%)
iast_FULL	14.029 ms [13.751 ms, 14.308 ms]	9.672 ms (222.0%)
iast_GLOBAL	10.842 ms [10.652 ms, 11.032 ms]	6.485 ms (148.8%)
profiling	8.6 ms [8.461 ms, 8.74 ms]	4.243 ms (97.4%)
tracing	7.704 ms [7.592 ms, 7.815 ms]	3.347 ms (76.8%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	4.284 ms [4.236 ms, 4.332 ms]	-
iast	9.408 ms [9.241 ms, 9.576 ms]	5.125 ms (119.6%)
iast_FULL	14.245 ms [13.966 ms, 14.524 ms]	9.962 ms (232.5%)
iast_GLOBAL	10.56 ms [10.373 ms, 10.747 ms]	6.276 ms (146.5%)
profiling	9.473 ms [9.32 ms, 9.626 ms]	5.189 ms (121.1%)
tracing	7.928 ms [7.805 ms, 8.05 ms]	3.644 ms (85.1%)

Request duration reports for petclinic

gantt
    title petclinic - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a
    dateFormat X
    axisFormat %s
section baseline
no_agent (37.614 ms) : 37301, 37926
.   : milestone, 37614,
appsec (46.813 ms) : 46401, 47225
.   : milestone, 46813,
code_origins (45.922 ms) : 45511, 46333
.   : milestone, 45922,
iast (44.177 ms) : 43784, 44570
.   : milestone, 44177,
profiling (48.68 ms) : 48219, 49141
.   : milestone, 48680,
tracing (43.94 ms) : 43572, 44307
.   : milestone, 43940,
section candidate
no_agent (36.926 ms) : 36630, 37223
.   : milestone, 36926,
appsec (47.834 ms) : 47409, 48260
.   : milestone, 47834,
code_origins (45.743 ms) : 45362, 46124
.   : milestone, 45743,
iast (43.033 ms) : 42659, 43407
.   : milestone, 43033,
profiling (50.54 ms) : 50056, 51024
.   : milestone, 50540,
tracing (44.88 ms) : 44497, 45263
.   : milestone, 44880,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	37.614 ms [37.301 ms, 37.926 ms]	-
appsec	46.813 ms [46.401 ms, 47.225 ms]	9.2 ms (24.5%)
code_origins	45.922 ms [45.511 ms, 46.333 ms]	8.308 ms (22.1%)
iast	44.177 ms [43.784 ms, 44.57 ms]	6.563 ms (17.4%)
profiling	48.68 ms [48.219 ms, 49.141 ms]	11.066 ms (29.4%)
tracing	43.94 ms [43.572 ms, 44.307 ms]	6.326 ms (16.8%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	36.926 ms [36.63 ms, 37.223 ms]	-
appsec	47.834 ms [47.409 ms, 48.26 ms]	10.908 ms (29.5%)
code_origins	45.743 ms [45.362 ms, 46.124 ms]	8.817 ms (23.9%)
iast	43.033 ms [42.659 ms, 43.407 ms]	6.107 ms (16.5%)
profiling	50.54 ms [50.056 ms, 51.024 ms]	13.614 ms (36.9%)
tracing	44.88 ms [44.497 ms, 45.263 ms]	7.954 ms (21.5%)

Dacapo

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	daniel.mohedano/failed-test-replay
git_commit_date	1756292489	1756297828
git_commit_sha	`8799a82`	`73bf81d`
release_version	1.53.0-SNAPSHOT~8799a82b0a	1.51.0-SNAPSHOT~73bf81d739

See matching parameters

	Baseline	Candidate
application	biojava	biojava
ci_job_date	1756299886	1756299886
ci_job_id	1101030790	1101030790
ci_pipeline_id	74844364	74844364
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-0-qelu8213 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-0-qelu8213 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for biojava

gantt
    title biojava - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.574 s) : 15574000, 15574000
.   : milestone, 15574000,
appsec (14.75 s) : 14750000, 14750000
.   : milestone, 14750000,
iast (18.665 s) : 18665000, 18665000
.   : milestone, 18665000,
iast_GLOBAL (17.781 s) : 17781000, 17781000
.   : milestone, 17781000,
profiling (15.431 s) : 15431000, 15431000
.   : milestone, 15431000,
tracing (15.043 s) : 15043000, 15043000
.   : milestone, 15043000,
section candidate
no_agent (14.991 s) : 14991000, 14991000
.   : milestone, 14991000,
appsec (14.903 s) : 14903000, 14903000
.   : milestone, 14903000,
iast (18.503 s) : 18503000, 18503000
.   : milestone, 18503000,
iast_GLOBAL (18.19 s) : 18190000, 18190000
.   : milestone, 18190000,
profiling (15.769 s) : 15769000, 15769000
.   : milestone, 15769000,
tracing (14.926 s) : 14926000, 14926000
.   : milestone, 14926000,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	15.574 s [15.574 s, 15.574 s]	-
appsec	14.75 s [14.75 s, 14.75 s]	-824.0 ms (-5.3%)
iast	18.665 s [18.665 s, 18.665 s]	3.091 s (19.8%)
iast_GLOBAL	17.781 s [17.781 s, 17.781 s]	2.207 s (14.2%)
profiling	15.431 s [15.431 s, 15.431 s]	-143.0 ms (-0.9%)
tracing	15.043 s [15.043 s, 15.043 s]	-531.0 ms (-3.4%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	14.991 s [14.991 s, 14.991 s]	-
appsec	14.903 s [14.903 s, 14.903 s]	-88.0 ms (-0.6%)
iast	18.503 s [18.503 s, 18.503 s]	3.512 s (23.4%)
iast_GLOBAL	18.19 s [18.19 s, 18.19 s]	3.199 s (21.3%)
profiling	15.769 s [15.769 s, 15.769 s]	778.0 ms (5.2%)
tracing	14.926 s [14.926 s, 14.926 s]	-65.0 ms (-0.4%)

Execution time for tomcat

gantt
    title tomcat - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.468 ms) : 1456, 1479
.   : milestone, 1468,
appsec (3.595 ms) : 3382, 3809
.   : milestone, 3595,
iast (2.209 ms) : 2145, 2273
.   : milestone, 2209,
iast_GLOBAL (2.243 ms) : 2179, 2307
.   : milestone, 2243,
profiling (2.053 ms) : 2001, 2104
.   : milestone, 2053,
tracing (2.02 ms) : 1970, 2069
.   : milestone, 2020,
section candidate
no_agent (1.473 ms) : 1461, 1485
.   : milestone, 1473,
appsec (3.649 ms) : 3432, 3867
.   : milestone, 3649,
iast (2.206 ms) : 2142, 2270
.   : milestone, 2206,
iast_GLOBAL (2.242 ms) : 2178, 2305
.   : milestone, 2242,
profiling (2.048 ms) : 1997, 2100
.   : milestone, 2048,
tracing (2.009 ms) : 1960, 2058
.   : milestone, 2009,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.468 ms [1.456 ms, 1.479 ms]	-
appsec	3.595 ms [3.382 ms, 3.809 ms]	2.127 ms (144.9%)
iast	2.209 ms [2.145 ms, 2.273 ms]	741.725 µs (50.5%)
iast_GLOBAL	2.243 ms [2.179 ms, 2.307 ms]	775.032 µs (52.8%)
profiling	2.053 ms [2.001 ms, 2.104 ms]	584.922 µs (39.9%)
tracing	2.02 ms [1.97 ms, 2.069 ms]	551.817 µs (37.6%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.473 ms [1.461 ms, 1.485 ms]	-
appsec	3.649 ms [3.432 ms, 3.867 ms]	2.176 ms (147.7%)
iast	2.206 ms [2.142 ms, 2.27 ms]	732.709 µs (49.7%)
iast_GLOBAL	2.242 ms [2.178 ms, 2.305 ms]	768.563 µs (52.2%)
profiling	2.048 ms [1.997 ms, 2.1 ms]	575.477 µs (39.1%)
tracing	2.009 ms [1.96 ms, 2.058 ms]	536.029 µs (36.4%)

datadog-official · 2025-08-11T09:23:09Z

🎯 Code Coverage
• Patch Coverage: 45.58%
• Total Coverage: 57.59% (+0.01%)

View detailed report

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 73bf81d | Docs | Was this helpful? Give us feedback!}

nikita-tkachenko-datadog · 2025-08-21T15:35:47Z

internal-api/src/main/java/datadog/trace/api/Config.java

-    return getFinalDebuggerBaseUrl() + "/debugger/v1/input";
+    if (Strings.isNotBlank(dynamicInstrumentationSnapshotUrl)) {
+      return dynamicInstrumentationSnapshotUrl;
+    } else if (isCiVisibilityFailedTestReplayActive() && isCiVisibilityAgentlessEnabled()) {


I wonder if this condition should be isCiVisibilityAgentlessEnabled alone: if we're running in agentless mode chances are there's no agent to connect to. I understand that debugger wasn't working in agentless before these changes, but why not fix it while we're at it

nikita-tkachenko-datadog · 2025-08-21T15:37:16Z

internal-api/src/main/java/datadog/trace/api/Config.java

@@ -1044,6 +1051,7 @@ public static String getHostName() {
  private final boolean DBMTracePreparedStatements;

  private final boolean dynamicInstrumentationEnabled;
+  private final String dynamicInstrumentationSnapshotUrl;


Is this needed for our testing or for customer set ups where a custom URL (a proxy?) needs to be used to communicate with DD?

This change was taken from #9186. We don't need it for our own testing given that we use CIVISIBILITY_AGENTLESS_URL, but it will be useful to let customers use a custom URL for the Exception Replay feature in general.

nikita-tkachenko-datadog · 2025-08-21T15:47:12Z

internal-api/src/main/java/datadog/trace/api/Config.java

@@ -1029,6 +1034,8 @@ public static String getHostName() {
  private final String gitPullRequestBaseBranch;
  private final String gitPullRequestBaseBranchSha;
  private final String gitCommitHeadSha;
+  private final boolean ciVisibilityFailedTestReplayEnabled;
+  private boolean ciVisibilityFailedTestReplayActive = false; // propagates setting to DI


It needs to be volatile if we choose to keep it :)

Having a mutable field in config doesn't quite fit how it is currently used: all the other fields are immutable (with the exception of the two that are just lazily initialized).
Also, having "test replay enabled" and "test replay active" next to each other is quite confusing.

I wonder if instead of setting this field we could harness datadog.trace.bootstrap.debugger.DebuggerContext#updateConfig.

An even better way would be to not call the debugger API directly, but try to invoke the remote config mechanism. This should be more robust: we avoid coupling Test Optimization to Debugger, and we make use of the centralized config updates logic that should (hopefully) take care of all the pitfalls for us

I couldn't find a way to call datadog.remoteconfig.state.ProductState#apply programmatically (seems like it's only being called by the remote config poller when we receive the config from the backend), but I don't think adding it is impossible. We can discuss this with the core tracer team

From my conversations with Live Debugger team, they wanted to move away from having their products only available when remote config is enabled, which is why originally I didn't take it into account. But we could technically limit FTR to be used only when remote config is enabled (and actually it is only needed in headless mode). Let's discuss the approach 👍

As discussed offline, let's see if we can separate "dynamic config" from "remote config" and add some means of programmatically controlling the former

nikita-tkachenko-datadog · 2025-08-21T15:56:49Z

...nt/agent-debugger/src/main/java/com/datadog/debugger/exception/DefaultExceptionDebugger.java

-    if (t instanceof Error) {
-      if (LOGGER.isDebugEnabled()) {
-        LOGGER.debug("Skip handling error: {}", t.toString());
+    if (isFailedTestReplayActive) {


I wonder if we can get away without propagating this flag, and just check the test strategy

nikita-tkachenko-datadog · 2025-08-21T16:04:34Z

...ent-ci-visibility/src/main/java/datadog/trace/civisibility/events/TestEventsHandlerImpl.java

@@ -181,6 +183,10 @@ public void onTestStart(
    }

    if (testExecutionHistory != null) {
+      if (testExecutionHistory instanceof RetryUntilSuccessful) {
+        // Used by FailedTestReplay to limit the instrumentation to AutoTestRetries


I wonder how correct it is to be checking this on a per-test basis: if ATR is enabled in the backend, then every test is subject to auto-retry with the exception of attempt-to-fixes and new tests (as respective execution policies have higher priority than ATR). But do we really not want to enable exception replay for these two as well?

In my opinion I think it might be beneficial for all retry mechanisms. Attempt to fix could be a bit more dangerous with its 20 retries regarding the overhead FTR could introduce, but given that right now we only support the manual flow, it shouldn't be too big of an issue. I made the changes to limit it to ATR after the Guild meeting in order to align it with JS' implementation

As discussed offline, let's add a dedicated method to TestExecutionPolicy that'll determine whether FTR is enabled for a given test

nikita-tkachenko-datadog · 2025-08-21T16:08:10Z

...ent-ci-visibility/src/main/java/datadog/trace/civisibility/events/TestEventsHandlerImpl.java

@@ -181,6 +183,10 @@ public void onTestStart(
    }

    if (testExecutionHistory != null) {
+      if (testExecutionHistory instanceof RetryUntilSuccessful) {
+        // Used by FailedTestReplay to limit the instrumentation to AutoTestRetries
+        test.setTag(DDTags.TEST_STRATEGY, RetryReason.atr.toString());


You can use datadog.trace.civisibility.domain.TestImpl#context to store data that you don't want to send to the backend. It is more idiomatic than adding/removing tags. As a nice side effect, the context is propagated to children spans, so if a test makes an HTTP request and the exception happens inside the child HTTP span, the context will be there as well

Also, can we call testExecutionHistory.currentExecutionRetryReason() and just store the result of that? Doing instanceof is breaking encapsulation.

Retry reason will be null for the initial test run, but IIUC we don't apply exception replay to the first run anyway, right?

In this case we also need the information during the first execution in order to create the Exception Replay probe. The flow is:

First run fails, probe is created and exception instrumented

Test is retried, fails again, the context information from the probe is captured and sent.

Maybe a testExecutionHistory.retryReason() could have been a better approach to avoid the breaking of encapsulation. But if we're able to propagate this information either through the test context or by accessing the test strategy I agree it is a much cleaner approach.

nikita-tkachenko-datadog · 2025-08-21T16:20:02Z

dd-java-agent/agent-bootstrap/src/main/java/datadog/trace/bootstrap/Agent.java

@@ -632,6 +632,7 @@ public void execute() {
      }

      maybeStartAppSec(scoClass, sco);
+      // start civisibility before debugger to enable Failed Test Replay correctly in headless mode


If we manage to plug into remote config as described in the other comment, there should be no ordering dependency 🤞

nikita-tkachenko-datadog · 2025-08-21T16:24:37Z

...ility/src/main/java/datadog/trace/civisibility/domain/buildsystem/BuildSystemModuleImpl.java

@@ -180,6 +181,18 @@ private Map<String, String> getPropertiesPropagatedToChildProcess(
        Strings.propertyNameToSystemPropertyName(CiVisibilityConfig.TEST_MANAGEMENT_ENABLED),
        Boolean.toString(executionSettings.getTestManagementSettings().isEnabled()));

+    propagatedSystemProperties.put(
+        Strings.propertyNameToSystemPropertyName(


Do we really need a dedicated setting for this? I wonder if it can be derived from DebuggerConfig.EXCEPTION_REPLAY_ENABLED && CI_VISIBILITY_ENABLED

I think this could cause problems because we use the ..._ENABLED settings as kill switches in datadog.trace.civisibility.config.ExecutionSettingsFactoryImpl#doCreate. So, although it would work when propagating the settings to the child process, in the parent process EXCEPTION_REPLAY_ENABLED==false and CI_VISIBILITY_ENABLED==true would mean that FTR wouldn't be enabled (even if it was marked as enabled by the backend)

As discussed offline let's enable TO-specific debugger behaviour whenever exception replay is enabled and test optimization is enabled

nikita-tkachenko-datadog · 2025-08-21T16:27:21Z

...nt/agent-debugger/src/main/java/com/datadog/debugger/exception/DefaultExceptionDebugger.java

@@ -108,7 +139,11 @@ public void handleException(Throwable t, AgentSpan span) {
            exceptionProbeManager.createProbesForException(
                throwable.getStackTrace(), chainedExceptionIdx);
        if (creationResult.probesCreated > 0) {
-          AgentTaskScheduler.INSTANCE.execute(() -> applyExceptionConfiguration(fingerprint));
+          if (isFailedTestReplayActive || !applyConfigAsync) {


Can we get rid of applyConfigAsync and the corresponding config field/methods?

The applyConfigAsync config was also added from #9186, to let Exception Replay apply the instrumentation synchronously even without Failed Test Replay

nikita-tkachenko-datadog · 2025-08-21T16:31:02Z

dd-java-agent/agent-debugger/src/main/java/com/datadog/debugger/sink/DebuggerSink.java

+    @Override
+    public void beforeSuiteEnd() {
+      LOGGER.debug("CiVisibility BeforeSuiteEnd fired, flushing sink");
+      sink.lowRateFlush(sink);


I think debugger is already doing this asynchronously at a scheduled interval.
If I'm not mistaken, we just need to make sure whatever's left in the sink is flushed in com.datadog.debugger.sink.DebuggerSink#stop that is called from the shutdown hook.

daniel-mohedano added 4 commits July 4, 2025 12:22

add di_enabled to settings response

2c219fd

add FTR related metrics

a911ae0

add FTR to execution settings

d20159e

add basic exception replay integration in agent mode

c4c84d6

daniel-mohedano added type: enhancement Enhancements and improvements tag: do not merge Do not merge changes comp: ci visibility Continuous Integration Visibility comp: debugger Dynamic Instrumentation labels Jul 22, 2025

daniel-mohedano added 3 commits July 22, 2025 16:27

feat: headless and agentless changes

ae31670

Merge branch 'master' into daniel.mohedano/failed-test-replay

4f2d6a9

fix: tests

c7eeac8

daniel-mohedano added 8 commits August 4, 2025 10:21

fix: testng capabilities

14cf11c

feat: refactor agentless intakes

461fb1f

Merge branch 'master' into daniel.mohedano/failed-test-replay

14b1054

chore: update smoke test fixtures

a87eff0

test: add unit test for new Intake enum

19b4edf

test: remove ftr from instrumentation tests (not used)

9e81cc7

test: introduce FTR smoke tests for headfull and headless modes

4e3deb4

style: spotless and codenarc

f931654

daniel-mohedano added 3 commits August 11, 2025 12:31

feat: add test event finished FTR telemetry

eacde60

feat: add product field to snapshots

0228f71

feat: implement SuiteEnd listener for sink flushing

88fd665

daniel-mohedano changed the title ~~Failed Test Replay~~ Implement Failed Test Replay Aug 12, 2025

feat: introduce new config variables for debugger

ec9c54d

daniel-mohedano removed the tag: do not merge Do not merge changes label Aug 13, 2025

daniel-mohedano added 2 commits August 19, 2025 10:29

chore: remove todo

d4d2432

Merge branch 'master' into daniel.mohedano/failed-test-replay

ffda9b3

daniel-mohedano marked this pull request as ready for review August 20, 2025 10:07

daniel-mohedano requested review from a team as code owners August 20, 2025 10:07

daniel-mohedano requested review from shatzi, Mariovido, bric3 and PerfectSlayer and removed request for a team August 20, 2025 10:07

feat: align FTR settings with JS' implementation

49eaebf

nikita-tkachenko-datadog reviewed Aug 21, 2025

View reviewed changes

daniel-mohedano added 6 commits August 22, 2025 12:25

feat: PR suggestions

b4cef2c

feat: implement debugger config bridge

d82a7fa

fix: move check from policy to history for FTR

ab8d8e9

fix: tests

bd81769

chore: update codeowners

c257390

Merge branch 'master' into daniel.mohedano/failed-test-replay

73bf81d

Implement Failed Test Replay #9214

Are you sure you want to change the base?

Implement Failed Test Replay #9214

Uh oh!

Conversation

daniel-mohedano commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What Does This Do

Motivation

Contributor Checklist

Uh oh!

pr-commenter bot commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Debugger benchmarks

Parameters

Summary

Uh oh!

pr-commenter bot commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Startup

Parameters

Summary

Load

Parameters

Summary

Dacapo

Parameters

Summary

Uh oh!

datadog-official bot commented Aug 11, 2025 • edited by datadog-datadog-prod-us1 bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniel-mohedano Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniel-mohedano Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

daniel-mohedano commented Jul 22, 2025 •

edited

Loading

pr-commenter bot commented Jul 23, 2025 •

edited

Loading

pr-commenter bot commented Jul 23, 2025 •

edited

Loading

datadog-official bot commented Aug 11, 2025 •

edited by datadog-datadog-prod-us1 bot

Loading

daniel-mohedano Aug 22, 2025 •

edited

Loading

daniel-mohedano Aug 22, 2025 •

edited

Loading