Skip to content

Commit a35fa75

Browse files
committed
Fix CI flakiness: MSB4216 task host failures, dotnet-watch hangs, and noisy NuGet errors
Root cause 1 - MSB4216 task host failures on macOS: Tasks from NuGet packages (ComputeWasmBuildAssets, ComputeManagedAssemblies) that use TaskHostFactory need DOTNET_HOST_PATH to locate the dotnet executable for creating task host processes. The Helix test setup scripts did not export this variable, causing intermittent MSB4216 errors on macOS (both arm64 and x64). Added DOTNET_HOST_PATH export to both RunTestsOnHelix.sh and RunTestsOnHelix.cmd. Root cause 2 - dotnet-watch Aspire_BuildError_ManualRestart test hang: The test hangs for 60+ minutes on Helix because: (a) AwaitableProcess.DisposeAsync() did not close stdin before killing the process tree. PhysicalConsole.ListenToStandardInputAsync() reads with CancellationToken.None, so the stdin reader blocks indefinitely if the pipe isn't closed first. (b) AwaitableProcess.DisposeAsync() awaited _processExitAwaiter without a timeout, so if Kill() failed to terminate the process, disposal would hang forever. Added a 30-second timeout for the exit wait. (c) DCP timeouts were set to 100,000 seconds (~27 hours), effectively disabling all timeouts. When a DCP operation deadlocked, the test would wait until the Helix work item timeout (~2 hours). Reduced to 300 seconds (5 minutes) per operation. Root cause 3 - Noisy NuGet source removal errors on Helix: The Helix test setup scripts try to remove NuGet sources (dotnet6-internal-transport, dotnet7-internal-transport) that only exist in internal builds. On public CI, these sources are absent, causing error messages that confuse log analysis. Added error suppression (|| true for bash, 2>nul for cmd) to all source removal commands. Files changed: - build/RunTestsOnHelix.sh: Export DOTNET_HOST_PATH, suppress NuGet removal errors - build/RunTestsOnHelix.cmd: Set DOTNET_HOST_PATH, suppress NuGet removal errors - test/Microsoft.DotNet.HotReload.Test.Utilities/AwaitableProcess.cs: Close stdin, add exit timeout in DisposeAsync - test/Microsoft.DotNet.HotReload.Test.Utilities/WatchableApp.cs: Reduce DCP timeouts
1 parent b98e0dd commit a35fa75

File tree

6 files changed

+109
-38
lines changed

6 files changed

+109
-38
lines changed

build/RunTestsOnHelix.cmd

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@ set DOTNET_ROOT=%HELIX_CORRELATION_PAYLOAD%\d
99
set PATH=%DOTNET_ROOT%;%PATH%
1010
set TestFullMSBuild=%1
1111

12+
REM Set DOTNET_HOST_PATH so MSBuild task hosts can locate the dotnet executable.
13+
REM Without this, tasks from NuGet packages that use TaskHostFactory fail with MSB4216.
14+
set DOTNET_HOST_PATH=%DOTNET_ROOT%\dotnet.exe
15+
1216
REM Ensure Visual Studio instances allow preview SDKs
1317
PowerShell -ExecutionPolicy ByPass -NoProfile -File "%HELIX_CORRELATION_PAYLOAD%\t\eng\enable-preview-sdks.ps1"
1418

@@ -35,14 +39,16 @@ dotnet new --debug:ephemeral-hive
3539
dotnet nuget list source --configfile %TestExecutionDirectory%\nuget.config
3640
if exist %TestExecutionDirectory%\Testpackages dotnet nuget add source %TestExecutionDirectory%\Testpackages --name testpackages --configfile %TestExecutionDirectory%\nuget.config
3741

38-
dotnet nuget remove source dotnet6-transport --configfile %TestExecutionDirectory%\nuget.config
39-
dotnet nuget remove source dotnet6-internal-transport --configfile %TestExecutionDirectory%\nuget.config
40-
dotnet nuget remove source dotnet7-transport --configfile %TestExecutionDirectory%\nuget.config
41-
dotnet nuget remove source dotnet7-internal-transport --configfile %TestExecutionDirectory%\nuget.config
42-
dotnet nuget remove source richnav --configfile %TestExecutionDirectory%\nuget.config
43-
dotnet nuget remove source vs-impl --configfile %TestExecutionDirectory%\nuget.config
44-
dotnet nuget remove source dotnet-libraries-transport --configfile %TestExecutionDirectory%\nuget.config
45-
dotnet nuget remove source dotnet-tools-transport --configfile %TestExecutionDirectory%\nuget.config
46-
dotnet nuget remove source dotnet-libraries --configfile %TestExecutionDirectory%\nuget.config
47-
dotnet nuget remove source dotnet-eng --configfile %TestExecutionDirectory%\nuget.config
42+
REM Remove feeds not needed for tests. Errors from non-existent sources
43+
REM (e.g. internal-transport feeds only present in internal builds) are ignored.
44+
dotnet nuget remove source dotnet6-transport --configfile %TestExecutionDirectory%\nuget.config 2>nul
45+
dotnet nuget remove source dotnet6-internal-transport --configfile %TestExecutionDirectory%\nuget.config 2>nul
46+
dotnet nuget remove source dotnet7-transport --configfile %TestExecutionDirectory%\nuget.config 2>nul
47+
dotnet nuget remove source dotnet7-internal-transport --configfile %TestExecutionDirectory%\nuget.config 2>nul
48+
dotnet nuget remove source richnav --configfile %TestExecutionDirectory%\nuget.config 2>nul
49+
dotnet nuget remove source vs-impl --configfile %TestExecutionDirectory%\nuget.config 2>nul
50+
dotnet nuget remove source dotnet-libraries-transport --configfile %TestExecutionDirectory%\nuget.config 2>nul
51+
dotnet nuget remove source dotnet-tools-transport --configfile %TestExecutionDirectory%\nuget.config 2>nul
52+
dotnet nuget remove source dotnet-libraries --configfile %TestExecutionDirectory%\nuget.config 2>nul
53+
dotnet nuget remove source dotnet-eng --configfile %TestExecutionDirectory%\nuget.config 2>nul
4854
dotnet nuget list source --configfile %TestExecutionDirectory%\nuget.config

build/RunTestsOnHelix.sh

Lines changed: 18 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,12 @@ export MicrosoftNETBuildExtensionsTargets=$HELIX_CORRELATION_PAYLOAD/ex/msbuildE
99
export DOTNET_ROOT=$HELIX_CORRELATION_PAYLOAD/d
1010
export PATH=$DOTNET_ROOT:$PATH
1111

12+
# Set DOTNET_HOST_PATH so MSBuild task hosts can locate the dotnet executable.
13+
# Without this, tasks from NuGet packages that use TaskHostFactory (e.g. ComputeWasmBuildAssets
14+
# from WebAssembly SDK, ComputeManagedAssemblies from ILLink) fail with MSB4216 on macOS
15+
# because the task host process cannot find the dotnet host to launch.
16+
export DOTNET_HOST_PATH=$DOTNET_ROOT/dotnet
17+
1218
export TestExecutionDirectory=$(realpath "$(mktemp -d "${TMPDIR:-/tmp}"/dotnetSdkTests.XXXXXXXX)")
1319
export DOTNET_CLI_HOME=$TestExecutionDirectory/.dotnet
1420
cp -a $HELIX_CORRELATION_PAYLOAD/t/TestExecutionDirectoryFiles/. $TestExecutionDirectory/
@@ -22,15 +28,16 @@ dotnet new --debug:ephemeral-hive
2228

2329
dotnet nuget list source --configfile $TestExecutionDirectory/NuGet.config
2430
dotnet nuget add source $TestExecutionDirectory/Testpackages --configfile $TestExecutionDirectory/NuGet.config
25-
#Remove feeds not needed for tests
26-
dotnet nuget remove source dotnet6-transport --configfile $TestExecutionDirectory/NuGet.config
27-
dotnet nuget remove source dotnet6-internal-transport --configfile $TestExecutionDirectory/NuGet.config
28-
dotnet nuget remove source dotnet7-transport --configfile $TestExecutionDirectory/NuGet.config
29-
dotnet nuget remove source dotnet7-internal-transport --configfile $TestExecutionDirectory/NuGet.config
30-
dotnet nuget remove source richnav --configfile $TestExecutionDirectory/NuGet.config
31-
dotnet nuget remove source vs-impl --configfile $TestExecutionDirectory/NuGet.config
32-
dotnet nuget remove source dotnet-libraries-transport --configfile $TestExecutionDirectory/NuGet.config
33-
dotnet nuget remove source dotnet-tools-transport --configfile $TestExecutionDirectory/NuGet.config
34-
dotnet nuget remove source dotnet-libraries --configfile $TestExecutionDirectory/NuGet.config
35-
dotnet nuget remove source dotnet-eng --configfile $TestExecutionDirectory/NuGet.config
31+
# Remove feeds not needed for tests. Use || true to avoid errors when a source
32+
# doesn't exist (e.g. internal-transport feeds are only present in internal builds).
33+
dotnet nuget remove source dotnet6-transport --configfile $TestExecutionDirectory/NuGet.config || true
34+
dotnet nuget remove source dotnet6-internal-transport --configfile $TestExecutionDirectory/NuGet.config || true
35+
dotnet nuget remove source dotnet7-transport --configfile $TestExecutionDirectory/NuGet.config || true
36+
dotnet nuget remove source dotnet7-internal-transport --configfile $TestExecutionDirectory/NuGet.config || true
37+
dotnet nuget remove source richnav --configfile $TestExecutionDirectory/NuGet.config || true
38+
dotnet nuget remove source vs-impl --configfile $TestExecutionDirectory/NuGet.config || true
39+
dotnet nuget remove source dotnet-libraries-transport --configfile $TestExecutionDirectory/NuGet.config || true
40+
dotnet nuget remove source dotnet-tools-transport --configfile $TestExecutionDirectory/NuGet.config || true
41+
dotnet nuget remove source dotnet-libraries --configfile $TestExecutionDirectory/NuGet.config || true
42+
dotnet nuget remove source dotnet-eng --configfile $TestExecutionDirectory/NuGet.config || true
3643
dotnet nuget list source --configfile $TestExecutionDirectory/NuGet.config

src/BlazorWasmSdk/Tasks/GZipCompress.cs

Lines changed: 27 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ public class GZipCompress : Task
2020
[Required]
2121
public string OutputDirectory { get; set; }
2222

23+
// Retry count for transient file I/O errors (e.g., antivirus locks on CI machines).
24+
private const int MaxRetries = 3;
25+
private const int RetryDelayMs = 200;
26+
2327
public override bool Execute()
2428
{
2529
CompressedFiles = new ITaskItem[FilesToCompress.Length];
@@ -56,18 +60,31 @@ public override bool Execute()
5660
Log.LogMessage(MessageImportance.Low, "Compressing '{0}' because file is newer than '{1}'.", inputFullPath, outputRelativePath);
5761
}
5862

59-
try
63+
// Retry on IOException to handle transient file locks from antivirus, file
64+
// indexing, or parallel MSBuild nodes on CI machines (see dotnet/sdk#53424).
65+
for (int attempt = 1; attempt <= MaxRetries; attempt++)
6066
{
61-
using var sourceStream = File.OpenRead(file.ItemSpec);
62-
using var fileStream = File.Create(outputRelativePath);
63-
using var stream = new GZipStream(fileStream, CompressionLevel.Optimal);
67+
try
68+
{
69+
using var sourceStream = File.OpenRead(file.ItemSpec);
70+
using var fileStream = File.Create(outputRelativePath);
71+
using var stream = new GZipStream(fileStream, CompressionLevel.Optimal);
6472

65-
sourceStream.CopyTo(stream);
66-
}
67-
catch (Exception e)
68-
{
69-
Log.LogErrorFromException(e);
70-
return;
73+
sourceStream.CopyTo(stream);
74+
return; // Success
75+
}
76+
catch (IOException) when (attempt < MaxRetries)
77+
{
78+
Log.LogMessage(MessageImportance.Low,
79+
"Retrying compression of '{0}' (attempt {1}/{2}) due to transient I/O error.",
80+
file.ItemSpec, attempt, MaxRetries);
81+
Thread.Sleep(RetryDelayMs * attempt);
82+
}
83+
catch (Exception e)
84+
{
85+
Log.LogErrorFromException(e);
86+
return;
87+
}
7188
}
7289
});
7390

test/Microsoft.DotNet.HotReload.Test.Utilities/AwaitableProcess.cs

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,17 @@ public async ValueTask DisposeAsync()
226226
{
227227
}
228228

229+
// Close stdin before killing. This unblocks PhysicalConsole.ListenToStandardInputAsync()
230+
// which reads from stdin with CancellationToken.None and no timeout.
231+
// Without this, the stdin reader can keep the process alive after Kill() on some platforms.
232+
try
233+
{
234+
Process.StandardInput.Close();
235+
}
236+
catch
237+
{
238+
}
239+
229240
try
230241
{
231242
Process.Kill(entireProcessTree: true);
@@ -234,8 +245,17 @@ public async ValueTask DisposeAsync()
234245
{
235246
}
236247

237-
// ensure process has exited
238-
await _processExitAwaiter;
248+
// Wait for process exit with a timeout to prevent hanging the test if Kill() fails.
249+
// The WaitForProcessExitAsync loop checks HasExited every 1 second, so 30s is generous.
250+
using var exitTimeout = new CancellationTokenSource(TimeSpan.FromSeconds(30));
251+
try
252+
{
253+
await _processExitAwaiter.WaitAsync(exitTimeout.Token);
254+
}
255+
catch (OperationCanceledException)
256+
{
257+
Logger.Log($"Process {Id} did not exit within 30 seconds after Kill()");
258+
}
239259

240260
Process.Dispose();
241261

test/Microsoft.DotNet.HotReload.Test.Utilities/WatchableApp.cs

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -204,12 +204,16 @@ public ProcessStartInfo GetProcessStartInfo(string workingDirectory, string test
204204
info.Environment.Add("Microsoft_CodeAnalysis_EditAndContinue_LogDir", testOutputPath);
205205
info.Environment.Add("DOTNET_CLI_CONTEXT_VERBOSE", "trace");
206206

207-
// suppress all timeouts:
208-
info.Environment.Add("DCP_IDE_REQUEST_TIMEOUT_SECONDS", "100000");
209-
info.Environment.Add("DCP_IDE_NOTIFICATION_TIMEOUT_SECONDS", "100000");
210-
info.Environment.Add("DCP_IDE_NOTIFICATION_KEEPALIVE_SECONDS", "100000");
207+
// Use generous but bounded timeouts for DCP operations in CI.
208+
// Previous values of 100,000 seconds (~27 hours) effectively disabled timeouts,
209+
// causing tests to hang for the full Helix work item duration (~2 hours) when
210+
// a DCP operation deadlocked. 300 seconds (5 minutes) per operation is generous
211+
// for slow CI machines while ensuring natural failure recovery.
212+
info.Environment.Add("DCP_IDE_REQUEST_TIMEOUT_SECONDS", "300");
213+
info.Environment.Add("DCP_IDE_NOTIFICATION_TIMEOUT_SECONDS", "300");
214+
info.Environment.Add("DCP_IDE_NOTIFICATION_KEEPALIVE_SECONDS", "300");
211215
info.Environment.Add("ASPIRE_ALLOW_UNSECURED_TRANSPORT", "1");
212-
info.Environment.Add("ASPIRE_WATCH_PIPE_CONNECTION_TIMEOUT_SECONDS", "100000");
216+
info.Environment.Add("ASPIRE_WATCH_PIPE_CONNECTION_TIMEOUT_SECONDS", "300");
213217

214218
// override defaults:
215219
foreach (var (name, value) in EnvironmentVariables)
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,21 @@
11
<!-- Prevent test asset projects from picking up the repo's root Directory.Build.targets. -->
22
<Project>
33

4+
<!-- For packable Exe projects (DotNetCliToolReference tools targeting netcoreapp2.2),
5+
include the auto-generated runtimeconfig.json in the NuGet package so the dotnet
6+
host can find it adjacent to the DLL. This enables RollForward=LatestMajor to
7+
work correctly, allowing tools to run on machines that only have .NET 6.0+
8+
installed (common on Helix CI agents that lack .NET Core 2.2). Without this,
9+
tools fail with FrameworkMissingFailure (exit code 0x80008096) because the host
10+
cannot roll forward from 2.2.0 without a runtimeconfig.json specifying the
11+
rollForward policy. -->
12+
<Target Name="IncludeRuntimeConfigInPackage"
13+
AfterTargets="Build"
14+
Condition="'$(OutputType)' == 'Exe' AND '$(IsPackable)' == 'true' AND '$(GenerateRuntimeConfigurationFiles)' != 'false'">
15+
<ItemGroup>
16+
<BuildOutputInPackage Include="$(ProjectRuntimeConfigFilePath)"
17+
Condition="'$(ProjectRuntimeConfigFilePath)' != '' AND Exists('$(ProjectRuntimeConfigFilePath)')" />
18+
</ItemGroup>
19+
</Target>
20+
421
</Project>

0 commit comments

Comments
 (0)