Pauseless Garbage Collector (Question) #115627
Replies: 97 comments 117 replies
-
Tagging subscribers to this area: @dotnet/gc Issue DetailsIs there any reason why something like the Pauseless Garbage Collector wich exists for Java from Azul never was implemented for Dotnet? https://www.azul.com/products/components/pgc/
|
Beta Was this translation helpful? Give feedback.
-
I'll add the generic response that the .NET GC supports functionality that Java's GCs do not and this can complicate or invalidate optimizations that other GCs can take advantage of. For examples, Java's GC doesn't support interior pointers while the .NET GC does |
Beta Was this translation helpful? Give feedback.
-
I'd like to see the GC team to write some insight about benefits and challenges/drawbacks of these options. What's theoretically possible but just complex/low priority for implementation? Which features/goals have fundamentally conflict? |
Beta Was this translation helpful? Give feedback.
-
These types of GCs typically trade throughput for shorter pause times. For example, they often use GC read barriers that make accessing object reference fields significantly slower. If you would like to understand the problem space, read the The Garbage Collection Handbook. It has a full chapter dedicated to real-time garbage collectors. There is nothing fundamental preventing building these types of garbage collectors for .NET. It is just a lot of work to build a production quality garbage collector. We do not see significant demand for these types of garbage collectors in .NET. Building alternative garbage collectors with very different performance tradeoffs has not been at the top of the core .NET team priority list. I would love to see .NET community experimenting with alternative garbage collectors with very different performance tradeoffs. It is how Azul came to be - Azul's garbage collector that you have linked to is not built by the core Java team. |
Beta Was this translation helpful? Give feedback.
-
I guess that having more and more official, simultaneously supported GCs would also increase the amount of work needed to maintain and improve them all, putting even more burden on the GC team which would mean that the existing GCs would be improved slower. |
Beta Was this translation helpful? Give feedback.
-
Yeah, but a pauseless collector for me seams to open a whole new area where .NET could be used. Even if it may be slower, but deterministic, without pauses every now and then, for some applications this could be a huge benefit. |
Beta Was this translation helpful? Give feedback.
-
Right. If it was to follow the Azul model, it would not impact the core GC team much. I believe that the core Java GC team does not spend any cycles on the Azul GC. The Azul GC is maintained by Azul that is a company with a closed source business model.
It comes down to numbers and opportunity costs. For example, how many new developers can pauseless GC bring to .NET? It is hard to make the numbers work. |
Beta Was this translation helpful? Give feedback.
-
For what's it worth, beware that buying the book as eBook from the official publisher only gives access to the book through the VitalSource service. There is no way to download the book except through the DRM encumbered software, and they managed to block my account before I was even able to read a single page (no explanation given, the service just responds with 401 error and logs me out). If you want to get the book, get it as a physical book or through Amazon Kindle and save yourself the trouble. |
Beta Was this translation helpful? Give feedback.
-
I am rather surprised to hear that! I'd love to see an experimental GC, so I'm certainly quite biased. But I'd imagine predictability of latency is a very significant concern for a number of large user bases. Game development (Unity) comes to mind of course. As do many areas in finance & algorithmic trading. Sorry, that's it for my sales pitch; but in short, I'd definitely love to see experimentation in this area. |
Beta Was this translation helpful? Give feedback.
-
Latency of a GC can be a concern in many of the same ways that latency of RAII can be a concern. Having a GC, including a GC that can "stop the world", is not itself strictly a blocker and it may be of interest to note that many of the broader/well known game engines do themselves use GCs (many, but not all, of which are incremental rather than pauseless). Most people's experience with .NET and a GC in environments like game dev, up until this point, has been with either the legacy Mono GC or the Unity GC, neither of which can really be compared with the performance, throughput, latency, or various other metrics of the precise GC that ships with RyuJIT. Having some form of incremental GC is likely still interesting, especially if it can be coordinated to run more so in places where the CPU isn't doing "important" work (such as when you're awaiting a dispatched GPU task to finish executing), but its hardly a requirement with an advanced modern GC, especially if you're appropriately taking memory management into consideration by utilizing pools, spans/views, and other similar techniques (just as you'd have to use in C++ to limit RAII or free overhead). |
Beta Was this translation helpful? Give feedback.
-
In essence, using a pool is no different from manually allocating memory. It does not reduce the mental burden of manual management required to allocate and reclaim memory. Of course, it is also necessary to explore safe programming methods similar to rust. This is a popular article about GC. |
Beta Was this translation helpful? Give feedback.
-
Would be nice to see how this changes in newer versions of .NET and also how JAVA compares against (with the default and the here mentioned pauseless collector) |
Beta Was this translation helpful? Give feedback.
-
Maybe an option like Incremental GC which is being adopted by Unity is feasible here, where it breaks up a "full GC" into several "partial GC" sequence (i.e. doing GC incrementally), so that although the total pausing time doesn't change, each pausing time of a single GC can be minimized to a nearly pauseless one. cc: @Maoni0 |
Beta Was this translation helpful? Give feedback.
-
I'm a game/engine dev on osu!. We've used C# throughout all of .NET 3.5 to .NET 8, and have fully rewritten the game over the years which has brought new challenges in terms of balancing features that wouldn't have been possible prior and what works best with the .NET GC. By far our greatest fight has been with the GC - it is definitely a felt presence and at the forefront of everything we do. I've personally gone pretty deep in minimising pauses with issues such as #48937, #12717, and #76290, but as a team we've always been very conscious about allocations because our main loop is running at potentially 1000Hz, or historically even more than that. What we've found works best for us is turning on
Where it breaks down, however, is areas that require allocs such as menus. This GC mode will cause terrible stutters when doing anything remotely intensive, meaning that we have to very carefully switch GC modes at opportune moments to get the best of both worlds, and sometimes those worlds are intertwined.
|
Beta Was this translation helpful? Give feedback.
-
@smoogipoo it seems to me that you should be working directly with MS folks on this. Your expertise on gamedev would help so many people out. Stuttering in Unity for example has been a blemish on C# for a very long time. It gives people the impression C# is just a bad language which is absolutely disastrous to the community as a whole as more people move away from these tools and end up using other languages. |
Beta Was this translation helpful? Give feedback.
-
Benchmark against garnet on .NET 9 with Server GC (DATAS):
Peak WorkingSet: 262.40234375MB Satori GC:
Peak WorkingSet: 235.31640625MB Satori GC (No Gen0):
Peak WorkingSet: 167.15234375MB |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
@VSadov I found a case where Satori GC doesn't perform well if the class has a finalizer. Test code: using System.Diagnostics.Tracing;
namespace gctest
{
class Test
{
static void Main()
{
GCEventListener.Start();
gcTest();
}
public static void gcTest()
{
for (int i = 0; i < 10000000; i++)
{
var thing = new FinalizingThing();
}
}
}
public class FinalizingThing
{
public FinalizingThing()
{
}
~FinalizingThing()
{
// Thread.Sleep(1); // Uncomment to get lower GC pauses
}
}
public sealed class GCEventListener : EventListener
{
private static GCEventListener instance = null;
private long timeGCStart = 0;
private bool verbose = false;
private GCEventListener()
{
Console.WriteLine("GCEventListener Created");
}
public static void Start(bool verbose = false)
{
if (instance == null)
instance = new GCEventListener();
}
// Called whenever an EventSource is created.
protected override void OnEventSourceCreated(EventSource eventSource)
{
// Watch for the .NET runtime EventSource and enable all of its events.
if (eventSource.Name.Equals("Microsoft-Windows-DotNETRuntime"))
{
EnableEvents(eventSource, EventLevel.Informational, (EventKeywords)0x1);
}
}
// Called whenever an event is written.
protected override void OnEventWritten(EventWrittenEventArgs eventData)
{
if (eventData.EventName.Contains("GCStart"))
{
timeGCStart = eventData.TimeStamp.Ticks;
}
else if (eventData.EventName.Contains("GCEnd"))
{
long timeGCEnd = eventData.TimeStamp.Ticks;
long gcIndex = long.Parse(eventData.Payload[0].ToString());
Console.WriteLine("GC#{0} took {1:f3}ms for generation {2}", gcIndex, (double)(timeGCEnd - timeGCStart) / 10.0 / 1000.0, eventData.Payload[1]);
}
}
}
} Result (with
And surprisingly, if I uncomment the line
btw I can get really promising result if I suppress the finalizer manually with
|
Beta Was this translation helpful? Give feedback.
-
It's probably been asked already but is there any higher level design doc for Satori explaining how it differs/what are the main ideas behind it, etc? |
Beta Was this translation helpful? Give feedback.
-
I rerun the BinaryTree benchmark with the .NET 9.0.7 against the latest Satori GC (VSadov/Satori@f14b740), with all default settings.
Turning it into ratio for comparison:
This time Satori GC performs really well and shows impressive numbers! It managed to balance throughput performance, latency and memory footprint. And you can get even better throughput performance numbers in Satori GC with
which gives you result:
|
Beta Was this translation helpful? Give feedback.
-
"We do not see significant demand for these types of garbage collectors in .NET" - you know why? This is a snippet of a brief discussion from the DirectX channel...
Not even suitable for audio guys. The Commodore64 was capable of respectable audio. In 2025 you have what is nearly a system language that can't be relied upon for audio. People don't even take C# seriously, that's why there isn't any demand, because it's a non-starter. The people that need reliable execution aren't even showing up to comment or give feedback. C# is just dismissed as being useless. |
Beta Was this translation helpful? Give feedback.
-
@VSadov Hello, excuse me. I noticed in the chart posted by some of the people above, the item "Max Pause Time (ms)" has a rather unfavorable value in my field of expertise. Is it possible to reduce this indicator (for example, to a maximum of 2ms)? |
Beta Was this translation helpful? Give feedback.
-
This is the GC stats I collected for osu!, which is a game that has been heavily optimized for years. With the default .NET GC (WKS GC) you can still observe stutters while navigating pages and scrolling down the list. During the game play, it has been highly optimized to make the best effort to not allocate anything unnecessary, make use of object pool, and use the LowLatency GCSettings. However even after such optimizations, with default .NET GC you can still observe noticeably pauses which causes stutters:
Things becomes even worse while using Server GC:
Now simply switching it to Satori GC, all the stutters just gone, the max pausing time is less than 3ms, and the number of GCs also dramatically reduced from 752 to 81, without regressing the FPS or the memory footprint (it's even a great improvement!):
When people have been put countless hours on the optimization and still facing the STW issues, keeping telling them "the GC is not the problem and they should pay more hours on the optimization" is a non-starter and just a kind of irresponsibility, keep ignoring the core needs from the developers who are impacted and overlooking the efforts that those developers have been fight for years. And that's why people doing stuffs requiring soft real-time don't even seriously take .NET as an option: the benefit/cost is just too low that you have to pay years to optimize the code to minimize the STW, while on other languages you don't even need to care too much about it. Now we have the Satori GC as an option, while still being officially unsupported, it's already a game changer that finally makes .NET suitable for such latency-sensitive scenarios. Also, people don't need a real pause-less GC, what they need is a GC that has a consistent reasonably low maximum pausing time (lower than one or two milliseconds) for latency-sensitive apps. |
Beta Was this translation helpful? Give feedback.
-
@VSadov are there any plans for this further? even to include this official into .NET? |
Beta Was this translation helpful? Give feedback.
-
Per suggestion in the comment It appears to work with both NativeAOT and with JIT-based apps. It is a bit more indirect than patching the binaries, but probably the right way to do it. As an example, the following works for me for an app targeting net8.0:
Then the .csproj for the app looks like the following: <Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
<PropertyGroup>
<PublishAot>true</PublishAot>
</PropertyGroup>
<PropertyGroup>
<RuntimeFrameworkVersion>8.0.16</RuntimeFrameworkVersion>
</PropertyGroup>
<ItemGroup>
<FrameworkReference Update="Microsoft.NETCore.App" RuntimeFrameworkVersion="8.0.16" />
</ItemGroup>
</Project> And <?xml version="1.0" encoding="utf-8"?>
<configuration>
<config>
<!-- Set the "value" here to the folder you will be using for your local Nuget cache. -->
<!-- Do not forget to delete the cache if you want to try again after rebuilding the runtime. -->
<add key="globalPackagesFolder" value="E:/Satori8NugetCache" />
</config>
<packageSources>
<!-- Prevent inheriting the global NuGet package sources -->
<clear />
<!-- Set this path to where your Shipping Artifacts are located in your build. -->
<add key="local" value="E:/Satori8/Satori/artifacts/packages/Release/Shipping" />
<!-- Any packages that might be required, but not present in your build, will have to be taken from the latest NuGet feed. -->
<!-- More info on: https://github.com/dotnet/sdk#installing-the-sdk -->
<add key="dotnet8" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet8/nuget/v3/index.json" />
<!-- Also add the nuget.org source, to use if packages are not found in sources above -->
<add key="NuGet" value="https://api.nuget.org/v3/index.json" />
</packageSources>
</configuration>
A FEW NOTES:
|
Beta Was this translation helpful? Give feedback.
-
has anyone tried Satori with F#? I think it would be pretty great there too, since immutability produces a lot one-of, short-lived objects. I may try it on couple apps we have in prod - they mostly suffer from keeping a lot of memory allocated when idling and waiting for messages from queue. I will have to figure out how to do it first, but I think it's going to be a fun experiment. Is Satori aiming to release everything as soon as possible? I've seen couple reports above with working set after benchmarks of 50mb instead of 1.3gb with Server GC - that's what we are seeing as well |
Beta Was this translation helpful? Give feedback.
-
I've noticed the .NET team using GCPerfSim, and found one example contextualised as a performance improvement in #111636. Attempting to reproduce the results here and, bearing in mind I'm blindly running code - it seems it may be a scenario which Satori doesn't like quite so much? Thread count = 2
Thread count = 8
The results vary wildly from run to run, so don't stare at the actual numbers too much. However, I've seen Satori range from 30s to 100s on the 8T test, though I haven't seen any such variance for WKS/SRV. Something that might be conducive is that I'm seeing Satori sit at about 60% system time whereas all other GCs are at ~10%. Here's a sampling: WKS:
SRV:
Satori:
|
Beta Was this translation helpful? Give feedback.
-
I just put Satori into a WinUI 3 app (Files), Satori didn't perform well and we observed pauses longer than 100ms. In this case the Satori GC is only slightly better than WKS GC, but not much.
GC stats for Satori GC (Interactive):
GC stats for WKS GC:
Things seems better under Satori GC LowLatency:
But there're still massive pauses that are longer than 100ms! |
Beta Was this translation helpful? Give feedback.
-
We deployed Satori GC to our production web service, and this is how Satori GC compares to DATAS. GC pauses were estimated by
Before (DATAS GC): After (Satori GC): |
Beta Was this translation helpful? Give feedback.
-
For easier integration, I authored a msbuild target to automatically use Satori GC for self-contained builds (modify the <Project>
<Target Name="FetchSatori" AfterTargets="ResolveRuntimePackAssets">
<DownloadFile
SourceUrl="<the URL of your Satori build artifacts>"
DestinationFolder="$(IntermediateOutputPath)$([System.Guid]::NewGuid().ToString('N'))"
Retries="3">
<Output TaskParameter="DownloadedFile" ItemName="SatoriArchive" />
</DownloadFile>
</Target>
<Target Name="ExtractSatori" AfterTargets="FetchSatori" DependsOnTargets="FetchSatori">
<Unzip
SourceFiles="@(SatoriArchive)"
DestinationFolder="$(IntermediateOutputPath)Satori"
OverwriteReadOnlyFiles="true"
/>
<Delete Files="@(SatoriArchive)" />
</Target>
<Target Name="IncludeSatoriInRuntimePackAssets" AfterTargets="ExtractSatori" DependsOnTargets="ExtractSatori">
<ItemGroup>
<RuntimePackAsset Remove="@(RuntimePackAsset)"
Condition=" '%(RuntimePackAsset.Filename)' == 'coreclr'
Or '%(RuntimePackAsset.Filename)' == 'clrjit'
Or '%(RuntimePackAsset.Filename)' == 'System.Private.CoreLib' " />
<SatoriRuntimePackAsset Include="$(IntermediateOutputPath)Satori\System.Private.CoreLib.dll">
<AssetType>runtime</AssetType>
</SatoriRuntimePackAsset>
<SatoriRuntimePackAsset Include="$(IntermediateOutputPath)Satori\clrjit.dll;$(IntermediateOutputPath)Satori\coreclr.dll">
<AssetType>native</AssetType>
<DropFromSingleFile>true</DropFromSingleFile>
</SatoriRuntimePackAsset>
<RuntimePackAsset Include="@(SatoriRuntimePackAsset)">
<DestinationSubPath>%(Filename)%(Extension)</DestinationSubPath>
<RuntimeIdentifier>$(RuntimeIdentifier)</RuntimeIdentifier>
<CopyLocal>true</CopyLocal>
<NuGetPackageId>Microsoft.NETCore.App.Runtime.win-x64</NuGetPackageId>
<FileVersion>42.42.42.42424</FileVersion>
<NuGetPackageVersion>42.42.42.42424</NuGetPackageVersion>
</RuntimePackAsset>
</ItemGroup>
</Target>
</Project> |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Is there any reason why something like the Pauseless Garbage Collector wich exists for Java from Azul never was implemented for Dotnet?
https://www.azul.com/products/components/pgc/
https://www.artima.com/articles/azuls-pauseless-garbage-collector
Beta Was this translation helpful? Give feedback.
All reactions