[Test Optimization] Add test.final_status tag#8091
Conversation
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (8091) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8091) - mean (69ms) : 68, 71
master - mean (69ms) : 67, 71
section Bailout
This PR (8091) - mean (73ms) : 72, 75
master - mean (73ms) : 72, 75
section CallTarget+Inlining+NGEN
This PR (8091) - mean (1,044ms) : 995, 1094
master - mean (1,043ms) : 1000, 1086
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8091) - mean (116ms) : 113, 120
master - mean (116ms) : 113, 120
section Bailout
This PR (8091) - mean (117ms) : 115, 119
master - mean (117ms) : 115, 119
section CallTarget+Inlining+NGEN
This PR (8091) - mean (783ms) : 724, 842
master - mean (777ms) : 725, 828
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8091) - mean (102ms) : 100, 105
master - mean (102ms) : 99, 105
section Bailout
This PR (8091) - mean (103ms) : 102, 105
master - mean (103ms) : 101, 104
section CallTarget+Inlining+NGEN
This PR (8091) - mean (768ms) : 747, 788
master - mean (763ms) : 737, 790
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8091) - mean (95ms) : 91, 98
master - mean (94ms) : 91, 97
section Bailout
This PR (8091) - mean (96ms) : 94, 98
master - mean (95ms) : 93, 98
section CallTarget+Inlining+NGEN
This PR (8091) - mean (641ms) : 622, 659
master - mean (635ms) : 622, 648
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8091) - mean (215ms) : 206, 224
master - mean (213ms) : 205, 221
section Bailout
This PR (8091) - mean (219ms) : 214, 225
master - mean (219ms) : 214, 225
section CallTarget+Inlining+NGEN
This PR (8091) - mean (1,233ms) : 1183, 1282
master - mean (1,223ms) : 1172, 1274
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8091) - mean (328ms) : 314, 342
master - mean (326ms) : 315, 337
section Bailout
This PR (8091) - mean (328ms) : 314, 343
master - mean (326ms) : 316, 336
section CallTarget+Inlining+NGEN
This PR (8091) - mean (1,077ms) : 1039, 1114
master - mean (1,069ms) : 1016, 1121
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8091) - mean (304ms) : 294, 314
master - mean (301ms) : 292, 310
section Bailout
This PR (8091) - mean (305ms) : 294, 316
master - mean (304ms) : 291, 316
section CallTarget+Inlining+NGEN
This PR (8091) - mean (1,008ms) : 952, 1064
master - mean (1,001ms) : 930, 1071
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8091) - mean (313ms) : 303, 323
master - mean (316ms) : 303, 328
section Bailout
This PR (8091) - mean (316ms) : 303, 330
master - mean (317ms) : 305, 329
section CallTarget+Inlining+NGEN
This PR (8091) - mean (1,015ms) : 907, 1124
master - mean (1,030ms) : 939, 1120
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
BenchmarksBenchmark execution time: 2026-02-16 16:14:09 Comparing candidate commit 3a460cf in PR branch Found 8 performance improvements and 10 performance regressions! Performance is the same for 159 metrics, 15 unstable metrics. scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net6.0
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody netcoreapp3.1
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net6.0
scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net6.0
scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack netcoreapp3.1
scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net6.0
scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net472
scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net6.0
scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net472
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice netcoreapp3.1
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool net6.0
scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net472
scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark net6.0
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishScope netcoreapp3.1
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan net6.0
|
Snapshots difference summaryThe following differences have been observed in committed snapshots. It is meant to help the reviewer. 198 occurrences of : + test.final_status: fail,
332 occurrences of : + test.final_status: pass,
205 occurrences of : + test.final_status: skip,
23 occurrences of : - test.test_management.attempt_to_fix_passed: true,
24 occurrences of : - test.test_management.attempt_to_fix_passed: false,
|
There was a problem hiding this comment.
Pull request overview
This PR adds a new test.final_status tag to test execution spans across NUnit, XUnit, and MsTest frameworks. The tag represents the adjusted final outcome of a test for CI pipeline result determination, enabling customers to query tests by their effective CI outcome and build monitors for truly failing tests (not just flaky ones).
Changes:
- Adds
test.final_statusconstant to TestTags.cs with values: pass, fail, or skip - Implements shared CalculateFinalStatus() logic in Common.cs with priority-based determination
- Integrates final status tracking across NUnit, XUnit, and MsTest frameworks with framework-specific execution tracking
Reviewed changes
Copilot reviewed 62 out of 66 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| TestTags.cs / TestSpanTags.cs | Added constant and property for final_status tag |
| Common.cs | Implemented shared CalculateFinalStatus() with 5-priority logic |
| XUnitIntegration.cs | Added final status calculation with ATR/EFD/ATF handling |
| TestOptimizationTestCommand.cs | NUnit final status with retry state tracking |
| MsTest integration files | Final status for pre-execution skips and failures |
| Test snapshots | Updated verified snapshots with final_status values |
| Integration test files | Added final_status to tag removal for deterministic tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// <summary> | ||
| /// Retry reason value for Early Flake Detection | ||
| /// </summary> | ||
| public const string TestRetryReasonEfd = "efd"; |
There was a problem hiding this comment.
Wow, we're not using any namespacing on these tag names? I'm sure these are defined somewhere else, but still 😬
Also, you have efd and atr but not atf? 😅
Summary of changes
Add a new test.final_status tag to the final execution span of tests across NUnit, XUnit, and MsTest frameworks. This tag represents the adjusted final outcome of a test for CI pipeline result determination, with values pass, fail, or skip.
Jira: SDTEST-2985
Core changes:
Fix:
Reason for change
When retry mechanisms are enabled (ATR, EFD, Attempt to Fix), a single test can run multiple times with different outcomes. Some intermediate outcomes are suppressed to avoid failing CI pipelines. Currently, there is no way to query tests by their final adjusted status to build monitors and alerts for hard failures on default branches.
The test.final_status tag enables customers to:
Priority logic:
ATF (Attempt to Fix) semantics:
For ATF tests, the goal is to determine if a fix actually resolved a flaky test. Therefore:
final_status = failfinal_status = passattempt_to_fix_passedtag is derived from the same logic for consistencyImplementation details
Shared logic (Common.cs):
anyExecutionFailedparameter to support ATF-specific behaviorNUnit:
XUnit:
MsTest:
Test coverage
Unit tests (TestFinalStatusTests.cs): 136 tests covering:
Integration tests:
Other details