-
Notifications
You must be signed in to change notification settings - Fork 312
Implement Failed Test Replay #9214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Debugger benchmarksParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 9 metrics, 6 unstable metrics. See unchanged results
Request duration reports for reportsgantt
title reports - request duration [CI 0.99] : candidate=None, baseline=None
dateFormat X
axisFormat %s
section baseline
noprobe (318.291 µs) : 290, 346
. : milestone, 318,
basic (277.609 µs) : 272, 284
. : milestone, 278,
loop (8.962 ms) : 8956, 8967
. : milestone, 8962,
section candidate
noprobe (327.838 µs) : 285, 370
. : milestone, 328,
basic (279.083 µs) : 274, 285
. : milestone, 279,
loop (8.965 ms) : 8961, 8970
. : milestone, 8965,
|
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 3 performance regressions! Performance is the same for 43 metrics, 13 unstable metrics.
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.05 s) : 0, 1050188
Total [baseline] (8.606 s) : 0, 8605640
Agent [candidate] (1.064 s) : 0, 1064171
Total [candidate] (8.662 s) : 0, 8662366
section iast
Agent [baseline] (1.178 s) : 0, 1177621
Total [baseline] (9.293 s) : 0, 9292592
Agent [candidate] (1.186 s) : 0, 1186098
Total [candidate] (9.362 s) : 0, 9362160
gantt
title insecure-bank - break down per module: candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.465 ms) : 0, 1465
crashtracking [candidate] (1.474 ms) : 0, 1474
BytebuddyAgent [baseline] (734.93 ms) : 0, 734930
BytebuddyAgent [candidate] (742.135 ms) : 0, 742135
GlobalTracer [baseline] (243.508 ms) : 0, 243508
GlobalTracer [candidate] (245.692 ms) : 0, 245692
AppSec [baseline] (30.252 ms) : 0, 30252
AppSec [candidate] (30.627 ms) : 0, 30627
Debugger [baseline] (6.07 ms) : 0, 6070
Debugger [candidate] (10.628 ms) : 0, 10628
Remote Config [baseline] (666.182 µs) : 0, 666
Remote Config [candidate] (665.419 µs) : 0, 665
Telemetry [baseline] (12.237 ms) : 0, 12237
Telemetry [candidate] (11.807 ms) : 0, 11807
section iast
crashtracking [baseline] (1.45 ms) : 0, 1450
crashtracking [candidate] (1.452 ms) : 0, 1452
BytebuddyAgent [baseline] (849.608 ms) : 0, 849608
BytebuddyAgent [candidate] (851.268 ms) : 0, 851268
GlobalTracer [baseline] (232.783 ms) : 0, 232783
GlobalTracer [candidate] (234.179 ms) : 0, 234179
IAST [baseline] (32.178 ms) : 0, 32178
IAST [candidate] (28.818 ms) : 0, 28818
AppSec [baseline] (26.021 ms) : 0, 26021
AppSec [candidate] (29.639 ms) : 0, 29639
Debugger [baseline] (5.772 ms) : 0, 5772
Debugger [candidate] (10.532 ms) : 0, 10532
Remote Config [baseline] (597.95 µs) : 0, 598
Remote Config [candidate] (617.479 µs) : 0, 617
Telemetry [baseline] (8.277 ms) : 0, 8277
Telemetry [candidate] (8.627 ms) : 0, 8627
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.058 s) : 0, 1057586
Total [baseline] (10.735 s) : 0, 10735405
Agent [candidate] (1.049 s) : 0, 1049315
Total [candidate] (10.702 s) : 0, 10701996
section appsec
Agent [baseline] (1.225 s) : 0, 1224843
Total [baseline] (10.757 s) : 0, 10757042
Agent [candidate] (1.223 s) : 0, 1222995
Total [candidate] (10.78 s) : 0, 10779911
section iast
Agent [baseline] (1.188 s) : 0, 1188404
Total [baseline] (10.929 s) : 0, 10929415
Agent [candidate] (1.186 s) : 0, 1185707
Total [candidate] (10.99 s) : 0, 10989619
section profiling
Agent [baseline] (1.211 s) : 0, 1210737
Total [baseline] (10.991 s) : 0, 10991045
Agent [candidate] (1.206 s) : 0, 1206037
Total [candidate] (10.823 s) : 0, 10822684
gantt
title petclinic - break down per module: candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.494 ms) : 0, 1494
crashtracking [candidate] (1.444 ms) : 0, 1444
BytebuddyAgent [baseline] (738.638 ms) : 0, 738638
BytebuddyAgent [candidate] (730.375 ms) : 0, 730375
GlobalTracer [baseline] (244.622 ms) : 0, 244622
GlobalTracer [candidate] (241.67 ms) : 0, 241670
AppSec [baseline] (30.377 ms) : 0, 30377
AppSec [candidate] (29.904 ms) : 0, 29904
Debugger [baseline] (6.137 ms) : 0, 6137
Debugger [candidate] (10.431 ms) : 0, 10431
Remote Config [baseline] (679.583 µs) : 0, 680
Remote Config [candidate] (661.986 µs) : 0, 662
Telemetry [baseline] (14.409 ms) : 0, 14409
Telemetry [candidate] (13.823 ms) : 0, 13823
section appsec
crashtracking [baseline] (1.471 ms) : 0, 1471
crashtracking [candidate] (1.453 ms) : 0, 1453
BytebuddyAgent [baseline] (756.709 ms) : 0, 756709
BytebuddyAgent [candidate] (755.406 ms) : 0, 755406
GlobalTracer [baseline] (235.346 ms) : 0, 235346
GlobalTracer [candidate] (234.943 ms) : 0, 234943
AppSec [baseline] (169.262 ms) : 0, 169262
AppSec [candidate] (171.001 ms) : 0, 171001
Debugger [baseline] (7.387 ms) : 0, 7387
Debugger [candidate] (5.675 ms) : 0, 5675
Remote Config [baseline] (649.44 µs) : 0, 649
Remote Config [candidate] (626.672 µs) : 0, 627
Telemetry [baseline] (9.295 ms) : 0, 9295
Telemetry [candidate] (9.235 ms) : 0, 9235
IAST [baseline] (23.603 ms) : 0, 23603
IAST [candidate] (23.522 ms) : 0, 23522
section iast
crashtracking [baseline] (1.467 ms) : 0, 1467
crashtracking [candidate] (1.456 ms) : 0, 1456
BytebuddyAgent [baseline] (858.102 ms) : 0, 858102
BytebuddyAgent [candidate] (852.624 ms) : 0, 852624
GlobalTracer [baseline] (235.257 ms) : 0, 235257
GlobalTracer [candidate] (234.035 ms) : 0, 234035
AppSec [baseline] (25.346 ms) : 0, 25346
AppSec [candidate] (26.85 ms) : 0, 26850
Debugger [baseline] (6.662 ms) : 0, 6662
Debugger [candidate] (10.346 ms) : 0, 10346
Remote Config [baseline] (624.801 µs) : 0, 625
Remote Config [candidate] (610.509 µs) : 0, 611
Telemetry [baseline] (8.389 ms) : 0, 8389
Telemetry [candidate] (8.338 ms) : 0, 8338
IAST [baseline] (31.431 ms) : 0, 31431
IAST [candidate] (30.396 ms) : 0, 30396
section profiling
ProfilingAgent [baseline] (107.855 ms) : 0, 107855
ProfilingAgent [candidate] (108.147 ms) : 0, 108147
crashtracking [baseline] (1.466 ms) : 0, 1466
crashtracking [candidate] (1.466 ms) : 0, 1466
BytebuddyAgent [baseline] (772.66 ms) : 0, 772660
BytebuddyAgent [candidate] (768.607 ms) : 0, 768607
GlobalTracer [baseline] (224.528 ms) : 0, 224528
GlobalTracer [candidate] (223.615 ms) : 0, 223615
AppSec [baseline] (30.356 ms) : 0, 30356
AppSec [candidate] (30.223 ms) : 0, 30223
Debugger [baseline] (7.092 ms) : 0, 7092
Debugger [candidate] (7.65 ms) : 0, 7650
Remote Config [baseline] (722.503 µs) : 0, 723
Remote Config [candidate] (712.032 µs) : 0, 712
Telemetry [baseline] (15.689 ms) : 0, 15689
Telemetry [candidate] (15.5 ms) : 0, 15500
Profiling [baseline] (108.51 ms) : 0, 108510
Profiling [candidate] (108.811 ms) : 0, 108811
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 2 performance regressions! Performance is the same for 10 metrics, 12 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a
dateFormat X
axisFormat %s
section baseline
no_agent (4.357 ms) : 4308, 4406
. : milestone, 4357,
iast (9.395 ms) : 9239, 9551
. : milestone, 9395,
iast_FULL (14.029 ms) : 13751, 14308
. : milestone, 14029,
iast_GLOBAL (10.842 ms) : 10652, 11032
. : milestone, 10842,
profiling (8.6 ms) : 8461, 8740
. : milestone, 8600,
tracing (7.704 ms) : 7592, 7815
. : milestone, 7704,
section candidate
no_agent (4.284 ms) : 4236, 4332
. : milestone, 4284,
iast (9.408 ms) : 9241, 9576
. : milestone, 9408,
iast_FULL (14.245 ms) : 13966, 14524
. : milestone, 14245,
iast_GLOBAL (10.56 ms) : 10373, 10747
. : milestone, 10560,
profiling (9.473 ms) : 9320, 9626
. : milestone, 9473,
tracing (7.928 ms) : 7805, 8050
. : milestone, 7928,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a
dateFormat X
axisFormat %s
section baseline
no_agent (37.614 ms) : 37301, 37926
. : milestone, 37614,
appsec (46.813 ms) : 46401, 47225
. : milestone, 46813,
code_origins (45.922 ms) : 45511, 46333
. : milestone, 45922,
iast (44.177 ms) : 43784, 44570
. : milestone, 44177,
profiling (48.68 ms) : 48219, 49141
. : milestone, 48680,
tracing (43.94 ms) : 43572, 44307
. : milestone, 43940,
section candidate
no_agent (36.926 ms) : 36630, 37223
. : milestone, 36926,
appsec (47.834 ms) : 47409, 48260
. : milestone, 47834,
code_origins (45.743 ms) : 45362, 46124
. : milestone, 45743,
iast (43.033 ms) : 42659, 43407
. : milestone, 43033,
profiling (50.54 ms) : 50056, 51024
. : milestone, 50540,
tracing (44.88 ms) : 44497, 45263
. : milestone, 44880,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a
dateFormat X
axisFormat %s
section baseline
no_agent (15.574 s) : 15574000, 15574000
. : milestone, 15574000,
appsec (14.75 s) : 14750000, 14750000
. : milestone, 14750000,
iast (18.665 s) : 18665000, 18665000
. : milestone, 18665000,
iast_GLOBAL (17.781 s) : 17781000, 17781000
. : milestone, 17781000,
profiling (15.431 s) : 15431000, 15431000
. : milestone, 15431000,
tracing (15.043 s) : 15043000, 15043000
. : milestone, 15043000,
section candidate
no_agent (14.991 s) : 14991000, 14991000
. : milestone, 14991000,
appsec (14.903 s) : 14903000, 14903000
. : milestone, 14903000,
iast (18.503 s) : 18503000, 18503000
. : milestone, 18503000,
iast_GLOBAL (18.19 s) : 18190000, 18190000
. : milestone, 18190000,
profiling (15.769 s) : 15769000, 15769000
. : milestone, 15769000,
tracing (14.926 s) : 14926000, 14926000
. : milestone, 14926000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~73bf81d739, baseline=1.53.0-SNAPSHOT~8799a82b0a
dateFormat X
axisFormat %s
section baseline
no_agent (1.468 ms) : 1456, 1479
. : milestone, 1468,
appsec (3.595 ms) : 3382, 3809
. : milestone, 3595,
iast (2.209 ms) : 2145, 2273
. : milestone, 2209,
iast_GLOBAL (2.243 ms) : 2179, 2307
. : milestone, 2243,
profiling (2.053 ms) : 2001, 2104
. : milestone, 2053,
tracing (2.02 ms) : 1970, 2069
. : milestone, 2020,
section candidate
no_agent (1.473 ms) : 1461, 1485
. : milestone, 1473,
appsec (3.649 ms) : 3432, 3867
. : milestone, 3649,
iast (2.206 ms) : 2142, 2270
. : milestone, 2206,
iast_GLOBAL (2.242 ms) : 2178, 2305
. : milestone, 2242,
profiling (2.048 ms) : 1997, 2100
. : milestone, 2048,
tracing (2.009 ms) : 1960, 2058
. : milestone, 2009,
|
🎯 Code Coverage 🔗 Commit SHA: 73bf81d | Docs | Was this helpful? Give us feedback! |
return getFinalDebuggerBaseUrl() + "/debugger/v1/input"; | ||
if (Strings.isNotBlank(dynamicInstrumentationSnapshotUrl)) { | ||
return dynamicInstrumentationSnapshotUrl; | ||
} else if (isCiVisibilityFailedTestReplayActive() && isCiVisibilityAgentlessEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this condition should be isCiVisibilityAgentlessEnabled
alone: if we're running in agentless mode chances are there's no agent to connect to. I understand that debugger wasn't working in agentless before these changes, but why not fix it while we're at it
@@ -1044,6 +1051,7 @@ public static String getHostName() { | |||
private final boolean DBMTracePreparedStatements; | |||
|
|||
private final boolean dynamicInstrumentationEnabled; | |||
private final String dynamicInstrumentationSnapshotUrl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed for our testing or for customer set ups where a custom URL (a proxy?) needs to be used to communicate with DD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was taken from #9186. We don't need it for our own testing given that we use CIVISIBILITY_AGENTLESS_URL
, but it will be useful to let customers use a custom URL for the Exception Replay feature in general.
@@ -1029,6 +1034,8 @@ public static String getHostName() { | |||
private final String gitPullRequestBaseBranch; | |||
private final String gitPullRequestBaseBranchSha; | |||
private final String gitCommitHeadSha; | |||
private final boolean ciVisibilityFailedTestReplayEnabled; | |||
private boolean ciVisibilityFailedTestReplayActive = false; // propagates setting to DI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to be volatile
if we choose to keep it :)
Having a mutable field in config doesn't quite fit how it is currently used: all the other fields are immutable (with the exception of the two that are just lazily initialized).
Also, having "test replay enabled" and "test replay active" next to each other is quite confusing.
I wonder if instead of setting this field we could harness datadog.trace.bootstrap.debugger.DebuggerContext#updateConfig
.
An even better way would be to not call the debugger API directly, but try to invoke the remote config mechanism. This should be more robust: we avoid coupling Test Optimization to Debugger, and we make use of the centralized config updates logic that should (hopefully) take care of all the pitfalls for us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a way to call datadog.remoteconfig.state.ProductState#apply
programmatically (seems like it's only being called by the remote config poller when we receive the config from the backend), but I don't think adding it is impossible. We can discuss this with the core tracer team
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my conversations with Live Debugger team, they wanted to move away from having their products only available when remote config is enabled, which is why originally I didn't take it into account. But we could technically limit FTR to be used only when remote config is enabled (and actually it is only needed in headless mode). Let's discuss the approach 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, let's see if we can separate "dynamic config" from "remote config" and add some means of programmatically controlling the former
if (t instanceof Error) { | ||
if (LOGGER.isDebugEnabled()) { | ||
LOGGER.debug("Skip handling error: {}", t.toString()); | ||
if (isFailedTestReplayActive) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can get away without propagating this flag, and just check the test strategy
@@ -181,6 +183,10 @@ public void onTestStart( | |||
} | |||
|
|||
if (testExecutionHistory != null) { | |||
if (testExecutionHistory instanceof RetryUntilSuccessful) { | |||
// Used by FailedTestReplay to limit the instrumentation to AutoTestRetries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how correct it is to be checking this on a per-test basis: if ATR is enabled in the backend, then every test is subject to auto-retry with the exception of attempt-to-fixes and new tests (as respective execution policies have higher priority than ATR). But do we really not want to enable exception replay for these two as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion I think it might be beneficial for all retry mechanisms. Attempt to fix could be a bit more dangerous with its 20 retries regarding the overhead FTR could introduce, but given that right now we only support the manual flow, it shouldn't be too big of an issue. I made the changes to limit it to ATR after the Guild meeting in order to align it with JS' implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, let's add a dedicated method to TestExecutionPolicy
that'll determine whether FTR is enabled for a given test
@@ -181,6 +183,10 @@ public void onTestStart( | |||
} | |||
|
|||
if (testExecutionHistory != null) { | |||
if (testExecutionHistory instanceof RetryUntilSuccessful) { | |||
// Used by FailedTestReplay to limit the instrumentation to AutoTestRetries | |||
test.setTag(DDTags.TEST_STRATEGY, RetryReason.atr.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use datadog.trace.civisibility.domain.TestImpl#context
to store data that you don't want to send to the backend. It is more idiomatic than adding/removing tags. As a nice side effect, the context is propagated to children spans, so if a test makes an HTTP request and the exception happens inside the child HTTP span, the context will be there as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can we call testExecutionHistory.currentExecutionRetryReason()
and just store the result of that? Doing instanceof
is breaking encapsulation.
Retry reason will be null
for the initial test run, but IIUC we don't apply exception replay to the first run anyway, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we also need the information during the first execution in order to create the Exception Replay probe. The flow is:
- First run fails, probe is created and exception instrumented
- Test is retried, fails again, the context information from the probe is captured and sent.
Maybe a testExecutionHistory.retryReason()
could have been a better approach to avoid the breaking of encapsulation. But if we're able to propagate this information either through the test context or by accessing the test strategy I agree it is a much cleaner approach.
@@ -632,6 +632,7 @@ public void execute() { | |||
} | |||
|
|||
maybeStartAppSec(scoClass, sco); | |||
// start civisibility before debugger to enable Failed Test Replay correctly in headless mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we manage to plug into remote config as described in the other comment, there should be no ordering dependency 🤞
@@ -180,6 +181,18 @@ private Map<String, String> getPropertiesPropagatedToChildProcess( | |||
Strings.propertyNameToSystemPropertyName(CiVisibilityConfig.TEST_MANAGEMENT_ENABLED), | |||
Boolean.toString(executionSettings.getTestManagementSettings().isEnabled())); | |||
|
|||
propagatedSystemProperties.put( | |||
Strings.propertyNameToSystemPropertyName( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a dedicated setting for this? I wonder if it can be derived from DebuggerConfig.EXCEPTION_REPLAY_ENABLED && CI_VISIBILITY_ENABLED
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could cause problems because we use the ..._ENABLED
settings as kill switches in datadog.trace.civisibility.config.ExecutionSettingsFactoryImpl#doCreate
. So, although it would work when propagating the settings to the child process, in the parent process EXCEPTION_REPLAY_ENABLED==false
and CI_VISIBILITY_ENABLED==true
would mean that FTR wouldn't be enabled (even if it was marked as enabled by the backend)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline let's enable TO-specific debugger behaviour whenever exception replay is enabled and test optimization is enabled
@@ -108,7 +139,11 @@ public void handleException(Throwable t, AgentSpan span) { | |||
exceptionProbeManager.createProbesForException( | |||
throwable.getStackTrace(), chainedExceptionIdx); | |||
if (creationResult.probesCreated > 0) { | |||
AgentTaskScheduler.INSTANCE.execute(() -> applyExceptionConfiguration(fingerprint)); | |||
if (isFailedTestReplayActive || !applyConfigAsync) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of applyConfigAsync
and the corresponding config field/methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The applyConfigAsync
config was also added from #9186, to let Exception Replay apply the instrumentation synchronously even without Failed Test Replay
@Override | ||
public void beforeSuiteEnd() { | ||
LOGGER.debug("CiVisibility BeforeSuiteEnd fired, flushing sink"); | ||
sink.lowRateFlush(sink); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think debugger is already doing this asynchronously at a scheduled interval.
If I'm not mistaken, we just need to make sure whatever's left in the sink is flushed in com.datadog.debugger.sink.DebuggerSink#stop
that is called from the shutdown hook.
What Does This Do
Implements Test Optimization's Failed Test Replay using Live Debugger's Exception Replay. When the feature is enabled and a test is retried due to Auto Test Retries, Exception Replay's logic will create a probe for the exception thrown (in the case of the test probably an assertion error, but not limited to it). When the test is retried, the probe captures debugging information if the exception is encountered again, creating a snapshot of the variables. If the snapshot is captured, it is send as a log to Datadog. The following modifications were made to Exception Replay's original implementation:
DebuggerConfigBridge
. It handles deferred updates, so no order dependency on startup is introduced between the products that want to use Live Debugger's features.DefaultExceptionDebugger
was modified to support Failed Test Replay. If working on Failed Test Replay mode (if CiVisibility is enabled) it will:Error
s, which were previously ignored.product
field to snapshots, populated withtest_optimization
if Failed Test Replay was marked as active. This allows us to have the option of not billing customers for logs generated by the product.DD_CIVISIBILITY_AGENTLESS_ENABLED
is set, Live Debugger's logic for Exception Replay will use the logs API instead of the agent's.DebuggerSink
now flushes on closing to avoid snapshots not being sent on test session finish.Additional changes:
BackendApiFactory.Intake
to a standaloneIntake
, given that it is useful in order to compute agentless mode URLs.failed_test_replay
in test frameworks that support Auto Test Retries.di_enabled
to the Settings response and telemetry.Validation:
MavenSmokeTest
now has an additional test for Failed Test Replay, validating the feature when build system instrumentation is present.JUnitConsoleSmokeTest
to validate the feature in headless mode. This test should ensure that the ordering dependency between CiVisibility's system and Live Debugger's is always accounted for.Motivation
Test Optimization wants to improve the support for Failed Test Replay, implementing it in additional languages apart from JS.
Contributor Checklist
type:
and (comp:
orinst:
) labels in addition to any usefull labelsclose
,fix
or any linking keywords when referencing an issue.Use
solves
instead, and assign the PR milestone to the issueJira ticket: SDTEST-2242