Skip to content

Add live Azure DevOps test result publishing#8297

Open
Evangelink wants to merge 13 commits into
mainfrom
dev/amauryleve/azdo-live-publishing
Open

Add live Azure DevOps test result publishing#8297
Evangelink wants to merge 13 commits into
mainfrom
dev/amauryleve/azdo-live-publishing

Conversation

@Evangelink
Copy link
Copy Markdown
Member

Part 1 of the brainstorm in #5951 — adds opt-in live test-result publishing to the Azure DevOps Tests tab via the REST API. No Microsoft.TeamFoundationServer.Client dependency; just HttpClient + System.Text.Json.

One of three PRs derived from the #5951 brainstorm. The others:

See issue comment for the broader plan.

Why

Today, getting results into the AzDO Tests tab requires generating a TRX and adding a downstream PublishTestResults@2 task. Two real problems:

  1. Long pipelines can't see what passed/failed until the very end.
  2. Parallel jobs race on TRX uploads.

This PR publishes results to a single AzDO run as tests finish, with no TRX dependency.

What

Two new opt-in CLI options on Microsoft.Testing.Extensions.AzureDevOpsReport:

Option Type Purpose
--publish-azdo-test-results zero-arity Enable live publishing. Auto-no-op (with warning) if any of SYSTEM_ACCESSTOKEN / SYSTEM_COLLECTIONURI / SYSTEM_TEAMPROJECT is missing.
--publish-azdo-run-name <name> string Override the run name. Default: <assembly> (<tfm>) on <agent> (or <stage>/<job> when present, sanitized). Requires --publish-azdo-test-results.

Wired into the existing AddAzureDevOpsProvider() — no new public API.

How it works

  • AuthAuthorization: Basic base64(":<SYSTEM_ACCESSTOKEN>") (AzDO's standard PAT-as-basic-auth pattern). Token never logged.
  • Run lifecyclePOST {project}/_apis/test/runs?api-version=7.1 at session start, batched POST .../results?api-version=7.1 as TestNodeUpdateMessages flow in, PATCH .../runs/{id}?api-version=7.1 to Completed (or Aborted on cancellation) at session end.
  • Batching — flush at 100 results or every 5s, with a global semaphore guaranteeing one in-flight POST per run.
  • Multi-processAzureDevOpsRunIdCoordinator writes a azdo-runid.<buildId>.{owner,json,participant.<pid>.json} set under TestResultsDirectory. The first process creates the run; joiners read the id and publish into the same run. The owner waits for participants to drain (bounded timeout, defaults to 30s) before finalizing.
  • Resilience — REST calls retry on HttpRequestException, IOException, SocketException, and TaskCanceledException (3 attempts, exponential backoff, 429 honors Retry-After). All callbacks catch Exception ex when ex is not OperationCanceledException and log a warning — publishing failures never fail the test run.
  • HttpClient — single static instance with a 30s Timeout and per-request linked CancellationTokenSource (15s).

Highlights from the expert-reviewer round

Implementation went through one full round of expert-reviewer. Critical/major issues addressed:

  • C1: bounded FinalizeRunAsync wait (no infinite hang on recycled PIDs).
  • C2: stale azdo-runid.json from a prior build no longer poisons the next build (ExpiresAt validated; coordination filenames keyed by BUILD_BUILDID; owner failures clean up the owner file).
  • C3: participant marker race fixed — owner also writes its own marker; existing marker file is overwritten rather than silently dropped.
  • C4: broadened catches to Exception ex when ex is not OperationCanceledException so JsonException/IOException/SocketException no longer escape IDataConsumer callbacks.
  • C5: finite HttpClient.Timeout + per-request linked CTS — eliminates publisher-wide deadlock on a hung endpoint.
  • M1: _lastFlushTime is now long ticks + Interlocked.Read/Exchange (no torn DateTimeOffset reads).
  • M2: flush loop now drains the queue past the threshold while holding the semaphore.
  • M4: cancellation-aware ReadAsByteArrayAsync(ct) / ReadAsStringAsync(ct) under #if NET.
  • M6: retry predicate widened to HttpRequestException or IOException or SocketException or TaskCanceledException.
  • M7: explicit [JsonPropertyName] on every wire DTO (request and response).
  • M9: automatedTestStorage now strips the extension so it matches legacy TRX-uploaded history keys.
  • M10/M11: tests use Environment.ProcessId / int.MaxValue for alive/dead PIDs and unique temp dirs under Path.GetTempPath() with cleanup-on-failure.

Owner re-election on owner-process crash is deferred behind a clear TODO (AzureDevOpsRunIdCoordinator.cs) with periodic lease renewal so a clean run never trips the bounded wait.

Tests

127 unit tests pass. New coverage in test/UnitTests/Microsoft.Testing.Extensions.UnitTests/AzureDevOpsLivePublishingTests.cs:

  • Run creation and id persistence.
  • Result-batch flush at size and time thresholds, plus session-end forced flush.
  • State mapping (Passed / Failed / Skipped / Timeout / Cancelled / Error).
  • 429 retry honors Retry-After.
  • Network/JSON failure → warning logged, session continues normally.
  • Run-id coordinator: create, join (multi-process), expired-file rejection, BuildId mismatch rejection.
  • Owner finalize bounded wait.

HelpInfoAllExtensionsTests expectations updated for the new options (both --help and --info blocks, alphabetical order preserved).

Build status (local)

  • .\.dotnet\dotnet.exe build src\Platform\Microsoft.Testing.Extensions.AzureDevOpsReport\Microsoft.Testing.Extensions.AzureDevOpsReport.csproj -c Debug0 warnings, 0 errors.
  • .\.dotnet\dotnet.exe test test\UnitTests\Microsoft.Testing.Extensions.UnitTests\Microsoft.Testing.Extensions.UnitTests.csproj127/127 passed.
  • .\build.cmd -pack0 warnings, 0 errors.

Out of scope (deliberate)

Checklist

  • Critical & Major review findings addressed
  • Localized via resx + xlf (regenerated with /t:UpdateXlf, not hand-edited)
  • Help/info acceptance test expectations updated
  • No new public API
  • .\build.cmd green (0 warnings, 0 errors)
  • Owner re-election on crash (deferred — TODO with lease renewal in place)
  • PR feedback addressed

Refs #5951

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 16, 2026 16:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds opt-in live Azure DevOps test-result publishing to the existing Azure DevOps reporting extension, using REST APIs plus multi-process run coordination.

Changes:

  • Adds new --publish-azdo-* CLI options and wires the live publisher into AddAzureDevOpsProvider().
  • Implements REST client, result batching, run-id coordination, and localized resource strings.
  • Adds unit coverage and updates all-extensions help/info expectations.
Show a summary per file
File Description
Directory.Packages.props Adds System.Text.Json package version.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/Microsoft.Testing.Extensions.AzureDevOpsReport.csproj References System.Text.Json.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/AzureDevOpsCommandLineOptions.cs Adds live publishing option names.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/AzureDevOpsCommandLineProvider.cs Registers and validates new CLI options.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/AzureDevOpsExtensions.cs Wires publisher as data consumer and session lifetime handler.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/AzureDevOpsLivePublishingModels.cs Adds live publishing constants and DTOs.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/AzureDevOpsRunIdCoordinator.cs Adds multi-process run-id coordination.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/AzureDevOpsTestResultsClient.cs Adds Azure DevOps REST client.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/AzureDevOpsTestResultsPublisher.cs Adds live result publishing lifecycle and batching.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/IAzureDevOpsTestResultsClient.cs Adds internal client abstraction.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/Resources/AzureDevOpsResources.resx Adds localized strings for options and warnings.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/Resources/xlf/AzureDevOpsResources.cs.xlf Updates Czech localization resources.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/Resources/xlf/AzureDevOpsResources.de.xlf Updates German localization resources.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/Resources/xlf/AzureDevOpsResources.es.xlf Updates Spanish localization resources.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/Resources/xlf/AzureDevOpsResources.fr.xlf Updates French localization resources.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/Resources/xlf/AzureDevOpsResources.it.xlf Updates Italian localization resources.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/Resources/xlf/AzureDevOpsResources.ja.xlf Updates Japanese localization resources.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/Resources/xlf/AzureDevOpsResources.ko.xlf Updates Korean localization resources.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/Resources/xlf/AzureDevOpsResources.pl.xlf Updates Polish localization resources.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/Resources/xlf/AzureDevOpsResources.pt-BR.xlf Updates Portuguese localization resources.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/Resources/xlf/AzureDevOpsResources.ru.xlf Updates Russian localization resources.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/Resources/xlf/AzureDevOpsResources.tr.xlf Updates Turkish localization resources.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/Resources/xlf/AzureDevOpsResources.zh-Hans.xlf Updates Simplified Chinese localization resources.
src/Platform/Microsoft.Testing.Extensions.AzureDevOpsReport/Resources/xlf/AzureDevOpsResources.zh-Hant.xlf Updates Traditional Chinese localization resources.
test/IntegrationTests/Microsoft.Testing.Platform.Acceptance.IntegrationTests/HelpInfoAllExtensionsTests.cs Updates help/info expectations for new options.
test/UnitTests/Microsoft.Testing.Extensions.UnitTests/AzureDevOpsLivePublishingTests.cs Adds live publishing unit tests.

Copilot's findings

  • Files reviewed: 26/26 changed files
  • Comments generated: 6

@Evangelink
Copy link
Copy Markdown
Member Author

@copilot address review comments

…n id validation, cleanup token, cross-platform path, resx typo

Co-authored-by: Evangelink <11340282+Evangelink@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented May 17, 2026

@copilot address review comments

All six review comments addressed in commit c39f66f:

  1. Coordinator cleanup (AzureDevOpsRunIdCoordinator.cs:93-94): The catch block now only deletes shared runIdFile and ownerFile when ownsOwnerFile == true; non-owners only clean up their own participant file.

  2. Internal-timeout TCE wrapping (AzureDevOpsTestResultsClient.cs:182-186): Added an outer try/catch (OperationCanceledException) when (!userCancellationToken.IsCancellationRequested) around the retry loop in SendCoreAsync. Any TaskCanceledException from a per-attempt or request-level internal timeout that escapes on the final attempt is now converted to InvalidOperationException, so it never surfaces as OperationCanceledException in the data consumer.

  3. Run id validation (AzureDevOpsTestResultsClient.cs:55): CreateTestRunAsync now validates response.Id > 0 before returning; a zero id (e.g. from {}) throws InvalidOperationException with the existing "invalid response" message.

  4. Bounded cleanup token (AzureDevOpsTestResultsPublisher.cs:196-199): FinalizeRunAsync is now called with a fresh CancellationTokenSource (timeout = coordination finalize timeout + 60 s) instead of the already-canceled session token, so the PATCH marking the run Aborted/Completed can still execute on cancellation.

  5. Cross-platform test path (AzureDevOpsLivePublishingTests.cs:439): Replaced the Windows-only Q:\…\MyTests.dll literal with Path.Combine("testfx-worktrees", "azdo-live", "artifacts", "MyTests.dll").

  6. Resx typo (AzureDevOpsResources.resx:125,134): "AzureDev Ops" → "Azure DevOps" in both Description and OptionDescription; UpdateXlf regenerated to update <source> strings and mark affected translations needs-review-translation.

@Evangelink Evangelink marked this pull request as ready for review May 17, 2026 19:13
Copilot AI review requested due to automatic review settings May 17, 2026 19:13
@Evangelink
Copy link
Copy Markdown
Member Author

@copilot address review comments

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

Comments suppressed due to low confidence (2)

test/IntegrationTests/Microsoft.Testing.Platform.Acceptance.IntegrationTests/HelpInfoAllExtensionsTests.cs:360

  • This expected provider description still uses AzureDev Ops, while the resource string now uses Azure DevOps; the --info acceptance expectation should be updated to the corrected spelling.
    Description: Azure DevOps report generator to write errors to the output in a way that AzureDev Ops understands.

test/IntegrationTests/Microsoft.Testing.Platform.Acceptance.IntegrationTests/HelpInfoAllExtensionsTests.cs:373

  • This expected option description still uses AzureDev Ops, but the option resource now says Azure DevOps, so this --info expectation will not match the generated output.
        Description: Enable Azure DevOps report generator to write errors to the output in a way that AzureDev Ops understands.
  • Files reviewed: 26/26 changed files
  • Comments generated: 3

@Evangelink
Copy link
Copy Markdown
Member Author

@copilot address review comments

Copy link
Copy Markdown
Contributor

Copilot AI commented May 17, 2026

@copilot address review comments

All six review comments were already addressed in commit 70d72f2b (see the previous reply for details). The CI is currently running on the merge commit 14cdb92.

@Evangelink
Copy link
Copy Markdown
Member Author

@copilot address review comments

… OCE in flush, add background flush loop

Co-authored-by: Evangelink <11340282+Evangelink@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 17, 2026 19:31
@Evangelink Evangelink review requested due to automatic review settings May 17, 2026 19:31
…ore OCE filter on second catch

Co-authored-by: Evangelink <11340282+Evangelink@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 17, 2026 19:33
@Evangelink Evangelink review requested due to automatic review settings May 17, 2026 19:33
Copy link
Copy Markdown
Contributor

Copilot AI commented May 17, 2026

@copilot address review comments

All three review comments addressed across two commits:

Issue A — Acceptance test string fix (HelpInfoAllExtensionsTests.cs:108, 360, 373)

  • All three "AzureDev Ops" occurrences updated to "Azure DevOps".

Issue B — OCE preventing finalization (AzureDevOpsTestResultsPublisher.cs:185-186)

  • Added an explicit catch (OperationCanceledException) before the general catch (Exception ex) when (ex is not OperationCanceledException) in OnTestSessionFinishingAsync. The session-end renew + flush are now best-effort: cancellation is swallowed so the FinalizeRunAsync block below (which uses a fresh bounded CancellationTokenSource) always runs.

Issue C — Background flush loop (AzureDevOpsTestResultsPublisher.cs:163-164)

  • Added BackgroundFlushLoopAsync — a Task.Run loop that calls Task.Delay(_options.FlushInterval, cancellationToken) then FlushPendingResultsAsync(force: false, …) on each tick. The loop is started in OnTestSessionStartingAsync right after the run is acquired, and stopped (cancelled + awaited) at the top of OnTestSessionFinishingAsync before the session-end forced flush, ensuring no concurrent in-flight POST races with the drain. Dispose cancels the CTS as a safety net.

Using real Task.Delay (rather than _task.Delay) means the loop's sleep is a genuine timer in production and won't spin inside the unit-test harness where FakeTask.Delay returns immediately.

All 34 AzureDevOps unit tests still pass (0 warnings, 0 errors).

Copy link
Copy Markdown
Contributor

Copilot AI commented May 17, 2026

@copilot address review comments

All three open review comments are addressed in the existing commits (95e4e06 / f71e657):

  • Client.cs:198 — TCE wrapping: The outer catch (OperationCanceledException) when (!userCancellationToken.IsCancellationRequested) (lines 180–186) converts every internal-timeout TaskCanceledException that escapes the final retry into InvalidOperationException, so it never propagates as OperationCanceledException in the data consumer.

  • Client.cs:55 — Run-id validation: response.Id > 0 guard is at lines 56–58; a zero/missing id throws InvalidOperationException with the "invalid response" message.

  • Publisher.cs:179–180 — Background flush loop: BackgroundFlushLoopAsync is started by Task.Run in OnTestSessionStartingAsync (line 151). It wakes on every real FlushInterval tick and calls FlushPendingResultsAsync(force: false, …), so results are pushed even when no further test messages arrive before session end.

Build: 0 warnings, 0 errors. Unit tests: 34/34 passed.

Copilot AI review requested due to automatic review settings May 18, 2026 07:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 26/26 changed files
  • Comments generated: 3

Use comparer.GetType().Name (simple name) instead of .ToString() (full name) so the rendered diagnostic message matches the AssertTests.AreAll.cs expectations and is consistent with Assert.Contains. This resolves the failing AreAllDistinct_*_WithComparer_HasDuplicate_ShouldFail tests in CI.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Evangelink
Copy link
Copy Markdown
Member Author

Status update on this PR:

Review comments
All open review threads were already addressed in earlier commits on this branch (95e4e06 / f71e657). The three not-yet-outdated threads correspond to fixes that are already in place:

  • AzureDevOpsTestResultsClient.cs:198 (TCE wrapping on final retry) — the outer catch (OperationCanceledException) when (!userCancellationToken.IsCancellationRequested) (lines 180–186) wraps the final-attempt timeout into InvalidOperationException.
  • AzureDevOpsTestResultsClient.cs:55 (empty/missing id in CreateTestRunResponse) — CreateTestRunAsync now throws when response.Id <= 0.
  • AzureDevOpsTestResultsPublisher.cs:180 (time-based flush only on subsequent messages) — BackgroundFlushLoopAsync (lines 277–303) runs a periodic flush from OnTestSessionStartingAsync and is awaited/cancelled in OnTestSessionFinishingAsync and Dispose.

Failing CI tests (AssertTests.AreAll)
The AreAllDistinct_*_WithComparer_HasDuplicate_ShouldFail failures are an unrelated pre-existing bug already present on main (introduced by #8237): Assert.AreAllDistinct.cs used comparer.GetType().ToString() (full name like Microsoft.VisualStudio.TestPlatform.TestFramework.UnitTests.AssertTests+CaseInsensitiveStringComparer) while the tests — and the equivalent code in Assert.Contains.cs — expect comparer.GetType().Name (e.g. CaseInsensitiveStringComparer). Pushed 363fc60 to switch to .Name, matching both the test expectations and the existing convention in Assert.Contains. The 23 local AreAllDistinct tests now pass.

Copilot AI review requested due to automatic review settings May 18, 2026 13:16
C10 (run-state semantics): stop marking the Azure DevOps run as 'Aborted' just because individual tests failed. ExitCode.Success and ExitCode.AtLeastOneTestFailed are both normal completions; anything else (TestSessionAborted, TestHostProcessExitedNonGracefully, TestAdapterTestSessionFailure, MinimumExpectedTestsPolicyViolation, ...) plus session cancellation or HasTestAdapterTestSessionFailure now mark the run as Aborted. Embeds ExitCodes.cs as a shared source file matching the TrxReport extension.

C11 (lease-read race): replace nullable TryReadLeaseFile with a tri-state LeaseReadResult (NotFound/Active/Expired/TransientReadError). TryAcquireOwnerAsync only deletes/clobbers the owner file on NotFound or Expired; an unreadable file (sharing-violation/partial write/garbage JSON) is treated as an active lease so two processes can't both create runs. CleanupStaleParticipants applies the same logic for participant files.

C12 (joiner give-up): WaitForRunIdFileAsync now also takes the owner file path. After the initial CoordinationReadRetryCount budget, it keeps polling while the owner lease still looks Active/TransientReadError (or Expired with the owner PID still alive), capped by the new CoordinationJoinerMaxWaitTime option (default 2 minutes). A healthy slow CreateTestRunAsync no longer makes joiners abandon and silently stop publishing.

New regression tests cover Completed-vs-Aborted state mapping for failing/aborted/canceled/adapter-failure exit paths, unreadable owner-file preservation, joiner waiting through an exhausted retry budget while owner lease is valid, and joiner take-over when the owner lease has expired with a dead PID.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Evangelink
Copy link
Copy Markdown
Member Author

Addressed all 12 review comments — pushed in e7f511a.

Already in place from earlier review pass (commit 70d72f2)

Comment Status
Cleanup deletes shared owner/run-id files even when non-owner ✅ catch handler now only removes shared files when ownsOwnerFile == true
TaskCanceledException from internal timeout escapes on final retry ✅ wrapped in InvalidOperationException when user token is not canceled
Final PATCH uses canceled token ✅ uses fresh cleanupCts derived from CoordinationFinalizeTimeout + 60s
Test mock path Windows-only ✅ now uses Path.Combine
Validate run id is positive ✅ throws on Id <= 0
"AzureDev Ops" typo ✅ resx & xlf regenerated as "Azure DevOps"
Help/info expectations ✅ updated in HelpInfoAllExtensionsTests
Cancellation in pre-finalize flush ✅ caught separately so finalization still runs
Time-based flush only on next message ✅ background flush loop runs every FlushInterval

Newly addressed in e7f511a

Comment Fix
Run state semantics — failing tests no longer map to Aborted Only ExitCode.Success and ExitCode.AtLeastOneTestFailed map to Completed; other non-zero codes (e.g. TestSessionAborted, TestHostProcessExitedNonGracefully, TestAdapterTestSessionFailure, MinimumExpectedTestsPolicyViolation), plus session cancellation and HasTestAdapterTestSessionFailure, map to Aborted.
Lease-read race — partial-write owner file could be deleted by a racing process Replaced nullable TryReadLeaseFile with tri-state ReadLease returning NotFound/Active/Expired/TransientReadError. TryAcquireOwnerAsync only takes over on NotFound/Expired; transient read errors are treated as an active lease. CleanupStaleParticipants follows the same rule.
Joiner gives up after fixed retry windows WaitForRunIdFileAsync now takes the owner file path. After the initial CoordinationReadRetryCount budget, it keeps polling while the owner lease still looks Active/TransientReadError (or Expired but with the owner PID still alive), capped by a new CoordinationJoinerMaxWaitTime option (default 2 min).

Validation

  • .\build.cmd -c Release0 warnings, 0 errors.
  • dotnet test test\UnitTests\Microsoft.Testing.Extensions.UnitTests (net9.0, Release) — 141 passed, 0 failed (7 new regression tests covering the three architectural fixes).

Reply posted on each conversation thread above with concrete file:line citations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants