Fine-tuning GC settings for a service allocating 2GB/s #94411

ogxd · 2023-11-06T14:19:30Z

ogxd
Nov 6, 2023

Dear .NET Community and Maintainers,

I am reaching out to seek your expertise and insights regarding a performance challenge we are facing with our ASP.NET Core 6.0 application, which is currently under significant load. Our backend service is processing billions of requests per hour, and despite our server farm's capacity, we are striving to optimize our backend code for maximum performance and minimal latency.

We have observed that a single instance of our application generates up to 2GB of allocations per second. Despite our efforts to minimize allocations, a substantial portion remains unavoidable. This leads to our primary concern: the Garbage Collector (GC) heuristics appear to struggle under such intense load, particularly with the volume of items to manage and the resulting heap fragmentation.

Our monitoring indicates that the GC does not trigger for longer than one second, suggesting a possible timeout mechanism that we have not found documented. When memory consumption reaches its maximum, we experience a "threshold effect," where the GC can no longer keep up, causing severe performance degradation.

As a temporary measure, we have resorted to forcing full-compacting garbage collections at regular intervals (hence the spikes on the memory on the graph). While this approach mitigates the issue, it is not a viable long-term solution and is not enough in some situations (for example on the screenshot above)

Given this context, we have several questions:

Is there an intrinsic one-second limitation for GC activity, and if so, what is the rationale behind it? Is there a way to configure the GC to extend this duration? We would prefer the GC to finish its job, even at the cost of occasional latency spikes, rather than falling behind and risking a complete application overload.
Are there additional settings or modes available that would allow for more aggressive garbage collection? We are currently operating in server GC mode.
In terms of reducing the time spent in GC, we have approximately 4GB of Gen2 data that is persistent and immutable (e.g., long-term caches). Is there a mechanism to instruct the GC to bypass these segments during collection? We have come across the FrozenObjects project but found it lacks documentation.

We are open to any suggestions or guidance you can provide. As you can see, outside of the overload situation, the CPU usage is only at about 40%, and we also have quite a margin on the network stack. It seems the GC is the bottleneck currently, so our goal is to find a way to fine-tune the garbage collection process to ensure stability and performance so that we can further increase service throughput at low latency.

Thank you for your time and assistance!

Answered by Maoni0

Nov 14, 2023

first of all, I don't see any kind of "1s threshold effect". each GC's start and end are recorded in the trace (if you open it in PerfView's GCStats view you'll see that the PauseStart for each GC (along with Pause MSec column that tells you how long this GC pauses the managed threads for). there's no GCs with more than 1s inbetween them. most of them are 200ms or less.

the big problem I see is kind of the opposite of what you described - is that GCs happened too often which causes a very high % time in GC. the reason for this is you are operating near 85% memory load and at that point GC starts to tighten up the gen0 allocation budget. it looks like you have 16GB memory on the machine? t…

View full answer

ogxd · 2023-11-06T16:18:13Z

ogxd
Nov 6, 2023
Author

I've read most of @Maoni0 memdoc but the issue here is not that there are "Too many pauses, ie, too many GCs" or "Long individual pauses" but instead that the GC is not aggressive enough and can no longer catch up when the heap size becomes too large and too fragmented (btw swap is disabled on the Linux server used for the screenshots. I am also wondering why it didn't go OOM at some point)

3 replies

Suchiman Nov 6, 2023

Collecting is based on budgets, not on timing. The allocator has an allocation budget (which determines how many bytes you can allocate before triggering a GC). That budget is adjusted based on the survival rate (if many objects survive, permit more allocations and collect less frequently to waste less time being unproductive)
There's this option https://learn.microsoft.com/en-us/dotnet/core/runtime-config/garbage-collector#conserve-memory . In addition there's https://learn.microsoft.com/en-us/dotnet/standard/garbage-collection/latency which controls whether to use background garbage collection and whether blocking / foreground gen 2 GC's are allowed. By default this should be Interactive but if someone put it into SustainedLowLatency, that could explain high heap sizes.
I'm not aware of any other mechanism than FrozenObjects, which are guaranteed not to get scanned. The typical generational guidance is to make sure that temporary objects don't reach gen 2 where the immutable forever objects should live.

I am also wondering why it didn't go OOM at some point

OOM only happens when doing a full blocking compacting GC was unable to free sufficient space for the desired allocation and the OS was unable to provide more.

ogxd Nov 6, 2023
Author

Collecting is based on budgets, not on timing. The allocator has an allocation budget (which determines how many bytes you can allocate before triggering a GC). That budget is adjusted based on the survival rate (if many objects survive, permit more allocations and collect less frequently to waste less time being unproductive)

This explains when the GC is triggered, but not why it never lasts more than exactly one second. It seems like there is some timeout mechanism to stop a GC after 1 second, leaving a part of the heap uncollected / uncompacted.
I use a custom tool to get that measurement, which uses ETW data.

There's this option https://learn.microsoft.com/en-us/dotnet/core/runtime-config/garbage-collector#conserve-memory . In addition there's https://learn.microsoft.com/en-us/dotnet/standard/garbage-collection/latency which controls whether to use background garbage collection and whether blocking / foreground gen 2 GC's are allowed. By default this should be Interactive but if someone put it into SustainedLowLatency, that could explain high heap sizes.

We'll try this one! Thanks a lot.

I'm not aware of any other mechanism than FrozenObjects, which are guaranteed not to get scanned. The typical generational guidance is to make sure that temporary objects don't reach gen 2 where the immutable forever objects should live.

But if an ASP NET Core request is taking a little more time than usual, its contextual objects (scoped and transient services and all request associated data) are inevitably going to survive gen0 or even gen1 ➡️ which implies more GC work ➡️ more latency ➡️ more slow requests ➡️ ♾️ 💥

Suchiman Nov 6, 2023

This explains when the GC is triggered, but not why it never lasts more than exactly one second. It seems like there is some timeout mechanism to stop a GC after 1 second, leaving a part of the heap uncollected / uncompacted.

Oh, sorry, I've misread the question. That is odd, AFAIK the GC is not capable of "aborting" a GC nor planning work ahead given time constraints. GC's start for a given generation and may expand to further generations if a lot of objects are promoted. But once it has decided to collect a generation, it's all or nothing.

Maoni0 · 2023-11-06T21:31:05Z

Maoni0
Nov 6, 2023
Collaborator

some timeout mechanism to stop a GC after 1 second

there's no such thing.

leaving a part of the heap uncollected / uncompacted

if a GC is not a full GC, by design it's not going to collect the full heap. and a GC is not necessarily compacting but it does compact, it's not going to only do part the job it set out to do. in other words, if it's a gen1 GC it will collect the whole gen0/gen1.

the best way to illustrate what you are seeing is to share a top level GC trace https://github.com/Maoni0/mem-doc/blob/master/doc/.NETMemoryPerformanceAnalysis.md#how-to-collect-top-level-gc-metrics

3 replies

Bluezen Nov 9, 2023

the best way to illustrate what you are seeing is to share a top level GC trace https://github.com/Maoni0/mem-doc/blob/master/doc/.NETMemoryPerformanceAnalysis.md#how-to-collect-top-level-gc-metrics

Please find a link to download a 5 minutes trace captured during a similar scenario (@ogxd and I work together): 20231107_200807_833_1.nettrace.7z.zip
(I compressed the nettrace file using 7zip to keep it under GitHub's 25MB limit, and then I zipped it to comply with GitHub's file type rules 😅).

The trace was taken with dotnet-monitor and I attempted to reproduce the --profile gc-collect argument of the dotnet trace command line with the following Providers:

"Providers": [{
        "Name": "Microsoft-Windows-DotNETRuntime",
        "EventLevel": "Informational",
        "Keywords": "0x1"
    },{
        "Name": "Microsoft-Windows-DotNETRuntimePrivate",
        "EventLevel": "Informational",
        "Keywords": "0x1"
    }],

Maoni0 Nov 14, 2023
Collaborator

first of all, I don't see any kind of "1s threshold effect". each GC's start and end are recorded in the trace (if you open it in PerfView's GCStats view you'll see that the PauseStart for each GC (along with Pause MSec column that tells you how long this GC pauses the managed threads for). there's no GCs with more than 1s inbetween them. most of them are 200ms or less.

the big problem I see is kind of the opposite of what you described - is that GCs happened too often which causes a very high % time in GC. the reason for this is you are operating near 85% memory load and at that point GC starts to tighten up the gen0 allocation budget. it looks like you have 16GB memory on the machine? that seems like a very small amount of memory for 48 heaps.

I see GC#104232 is an induced full compacting GC - did you happen to induce this yourself? after that since the memory load dropped (to 75%) it was able to have a larger gen0 allocation budget and the % time in GC dropped to 10% to 25% (still quite high). but since the heap size kept increasing due to high survival it quickly brough the memory load back to > 80%. so it'll be in the very high % time in GC very quickly.

Answer selected by ogxd

Bluezen Nov 15, 2023

Thank you for your analysis of our GC trace!
There are indeed no GCs with more than 1s in between them, except maybe GC #104233, but that one occurs after GC #104232, which, as you rightly guessed, is a manually induced full compacting GC that suspends the managed threads for more than 1 second. These induced GCs are a way for us to keep the heap under control, one of the reasons being to try to stay under the memory load threshold you mentioned. On memory-limited machines we schedule them every 5 minutes. Luckily, we finished upgrading the remaining 16GB machines to 32GB, and we no longer experience these problematic scenarios triggered by memory pressure. We are still using a scheduled full compacting GC, but we experimented with the DOTNET_GCConserveMemory parameter, and we are pretty confident it will replace our scheduled GC.

Xyncgas · 2023-11-15T04:31:19Z

Xyncgas
Nov 15, 2023

Simply install 1TB SSD set on your dev machine set all of it as virtual memory, and disable GC during code execution until they are done solves the problem

0 replies

Fine-tuning GC settings for a service allocating 2GB/s #94411

Uh oh!

Uh oh!

ogxd Nov 6, 2023

Replies: 3 comments · 6 replies

Uh oh!

ogxd Nov 6, 2023 Author

Uh oh!

Suchiman Nov 6, 2023

Uh oh!

Uh oh!

ogxd Nov 6, 2023 Author

Uh oh!

Suchiman Nov 6, 2023

Uh oh!

Maoni0 Nov 6, 2023 Collaborator

Uh oh!

Uh oh!

Bluezen Nov 9, 2023

Uh oh!

Maoni0 Nov 14, 2023 Collaborator

Uh oh!

Bluezen Nov 15, 2023

Uh oh!

Xyncgas Nov 15, 2023

ogxd
Nov 6, 2023

Replies: 3 comments 6 replies

ogxd
Nov 6, 2023
Author

ogxd Nov 6, 2023
Author

Maoni0
Nov 6, 2023
Collaborator

Maoni0 Nov 14, 2023
Collaborator

Xyncgas
Nov 15, 2023