C# performance: pool and reuse vertex objects to reduce allocations when using Clear() #1010
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a suggested performance improvement for C# when reusing Clipper objects.
I profiled Clipper2 perfomance in my application, which reuses ThreadLocal Clipper2 instances by repeatedly calling Clear() to reduce allocations and thus garbage collection pressure. Profiling showed that when reusing Clipper2, a large number of Vertex object is still allocated.
The proposed improvement significantly reduces these allocations by reusing previously allocated Vertex objects instead of trashing them and allocating new ones for every polygon boolean operation.
Here are the benchmarks, I added a switch for benchmarking to turn the behaviour on and off, which is not in this PR:
BenchmarkDotNet v0.15.1, Windows 11 (10.0.26100.6584/24H2/2024Update/HudsonValley)
AMD Ryzen 9 7950X3D 4.20GHz, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.304
[Host] : .NET 8.0.19 (8.0.1925.36514), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Job-IJRPZB : .NET 8.0.19 (8.0.1925.36514), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
InvocationCount=1 UnrollFactor=1
As seen, for my workloads (about 50% of CPU time in Clipper2 library) this reduces allocation volume by 18% - 25%. In multi threaded benchmarks this also significantly reduces the number of Gen0 and Gen1 GCs and the lock contentions they induce. This results in 6%- 16% faster execution.
To minimize the impact of this improvement to maintainability, I encapsulated the behaviour into a new internal class "VertexPoolList", which has most of its implementation like the standard C# list with reduced operations.
A List that pools objects added to it for reuse.
Indexing, growing and enumeration implementation is identical to System.Collections.Generic.List{T}.
The pooled list reuses allocated reference objects by not clearing the internal list array when Clear() is called, only reseting the count to 0. Operations are limited to read, add and clear.
In Clipper.Engine.cs we now only have to change the type of the vertex collection from List to VertexPoolList and change the 4 lines of code where new vertices are allocated to instead pass the properties to the VertexPoolList, which either reuses a Vertex or allocates a new one. When the VertexPoolList is empty (no clipper instance reuse), the behaviour is almost identical to before ( except for the inlined null check before allocating, which always gives true and thus should be an easy case for the CPU branch predictor).
I have tested this code in my own application for a few months (CI pipeline with ~200 unit tests), and have detected no errors with it until now. Any feedback is welcome, let me know what you think.