Pool the dictionary buffer when training a Zstandard dictionary by alinpahontu2912 · Pull Request #129125 · dotnet/runtime

alinpahontu2912 · 2026-06-08T12:30:01Z

ZstandardDictionary.Train allocates new byte[maxDictionarySize] on every call to hold the trained dictionary returned by the native ZDICT_trainFromBuffer. Upstream zstd guidance puts a reasonable dictionary size at ~100 KB and the CLI defaults to 112,640 bytes — both above the .NET 85,000-byte LOH threshold. So for the recommended-and-up sizes, every training call costs a fresh Large Object Heap allocation that survives until the next gen2 collection. This change rents the buffer from ArrayPool<byte>.Shared and returns it (with clearArray: true) once the trained dictionary has been copied into its own array.

Why dictionary sizes are LOH-heavy

Upstream zstd (zdict.h):

"A reasonable dictionary size, the dictBufferCapacity, is about 100KB. The zstd CLI defaults to a 110KB dictionary."

CLI manpage (zstd.1.md): --maxdict=# default is 112640 bytes.

The managed wrapper repeats this guidance (ZstandardDictionary.cs:85-87) and only guards the lower bound (ThrowIfLessThan(maxDictionarySize, 256, ...) at line 128) — there is no upper-bound check, so MB-sized buffers are also legal.

.NET LOH threshold is 85,000 bytes, so the recommended ~100 KB dictionary and the 112,640-byte CLI default both land on the LOH on every call.

Empirical evidence

I modelled the exact allocation pattern (new byte[N] vs Rent/Return) and measured per-call allocations with GC.GetAllocatedBytesForCurrentThread (Release build, 1,000 iterations, workstation GC, .NET 11 preview):

Buffer size	`new byte[]` per call	`Rent` per call (steady state)	Reduction	Hits LOH?
4 KB	4,120 B	0 B	100 %	no
50 KB	50,024 B	0 B	100 %	no
100 KB (zstd recommended)	102,424 B	0 B	100 %	YES
110 KB (zstd CLI default)	112,664 B	0 B	100 %	YES
1 MB	1,048,600 B	0 B	100 %	YES

For the recommended/default dictionary sizes, every Train call previously produced a ~100 KB LOH allocation that only a gen2 collection could reclaim. After this change, steady-state Train allocations for the dictionary buffer drop to zero.

Reproducer

Program.cs (run from a scratch net11.0 console project, dotnet run -c Release):

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
using System;
using System.Buffers;
using System.Runtime.CompilerServices;

int[] sizes = { 4_096, 50_000, 102_400, 112_640, 1_048_576 };
const int Iterations = 1_000;

foreach (int size in sizes)
{
    BeforePath(size);
    AfterPath(size);
    GC.Collect(); GC.WaitForPendingFinalizers(); GC.Collect();

    long b0 = GC.GetAllocatedBytesForCurrentThread();
    for (int i = 0; i < Iterations; i++) BeforePath(size);
    long b1 = GC.GetAllocatedBytesForCurrentThread();

    GC.Collect(); GC.WaitForPendingFinalizers(); GC.Collect();

    long a0 = GC.GetAllocatedBytesForCurrentThread();
    for (int i = 0; i < Iterations; i++) AfterPath(size);
    long a1 = GC.GetAllocatedBytesForCurrentThread();

    Console.WriteLine($"size={size,9:N0}  new/iter={(b1 - b0) / Iterations,9:N0} B  rent/iter={(a1 - a0) / Iterations,9:N0} B");
}

[MethodImpl(MethodImplOptions.NoInlining)]
static void BeforePath(int n)
{
    byte[] b = new byte[n];
    b[0] = 1;
    GC.KeepAlive(b);
}

[MethodImpl(MethodImplOptions.NoInlining)]
static void AfterPath(int n)
{
    byte[] b = ArrayPool<byte>.Shared.Rent(n);
    try
    {
        b[0] = 1;
        GC.KeepAlive(b);
    }
    finally
    {
        ArrayPool<byte>.Shared.Return(b, clearArray: true);
    }
}

Output (workstation GC, single thread):

size=    4,096  new/iter=    4,120 B  rent/iter=        0 B
size=   50,000  new/iter=   50,024 B  rent/iter=        0 B
size=  102,400  new/iter=  102,424 B  rent/iter=        0 B
size=  112,640  new/iter=  112,664 B  rent/iter=        0 B
size=1,048,576  new/iter=1,048,600 B  rent/iter=        0 B

The 24-byte overhead per allocation is the 64-bit byte[] object header (sync block + method table pointer + length). Rent reports 0 bytes/iter because the pool returns the same array each iteration after warm-up. The pre-PR path stays on the SOH for the first two rows and lands on the LOH for the last three.

ZstandardDictionary.Train allocated 'new byte[maxDictionarySize]' on every call. Dictionary sizes are typically tens to hundreds of KB (zstd recommends up to ~100 KB, but the API allows more), so each training call paid for a fresh GC allocation that often landed on the LOH. Rent the buffer from ArrayPool<byte>.Shared instead. Create copies the trained slice into an exact-sized array before returning, so the rented buffer can be returned immediately. Use clearArray: true on Return because the trained dictionary is derived from caller-supplied samples and must not linger in the shared pool. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dotnet-policy-service · 2026-06-08T12:31:20Z

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR reduces repeated large allocations in ZstandardDictionary.Train by renting the native training output buffer from ArrayPool<byte>.Shared instead of allocating a new byte[] every call, then returning the rented buffer after copying the trained dictionary into its own array.

Changes:

Replace per-call new byte[maxDictionarySize] with ArrayPool<byte>.Shared.Rent(maxDictionarySize) for the native training output buffer.
Ensure the rented dictionary buffer is returned in a finally block, currently using clearArray: true to avoid retaining caller-derived data in the pool.

+                    // Clear before returning: the trained dictionary is derived from caller-supplied samples.
+                    ArrayPool<byte>.Shared.Return(dictionaryBuffer, clearArray: true);


MihaZupan · 2026-06-08T12:38:27Z

+                    // Clear before returning: the trained dictionary is derived from caller-supplied samples.
+                    ArrayPool<byte>.Shared.Return(dictionaryBuffer, clearArray: true);


Suggested change

// Clear before returning: the trained dictionary is derived from caller-supplied samples.

ArrayPool<byte>.Shared.Return(dictionaryBuffer, clearArray: true);

ArrayPool<byte>.Shared.Return(dictionaryBuffer);

We practically never clear buffers outside of crypto, I don't see a reason to do it here

MihaZupan · 2026-06-08T12:45:29Z

-                    ZstandardUtils.ThrowIfError(dictSize);
-                    return Create(dictionaryBuffer.AsSpan(0, (int)dictSize));
+                }
+                finally


We already have a try/finally for the lengthsArray buffer, can we avoid even more nesting by reusing the existing blocks?

Another option is to not return the array to the pool on the exceptional path. I can't find a reference for this in learn.microsoft.com or this repo's docs, but I've seen Stephen Toub and others mentioned in a PRs that generally returning arrays to the pool in exceptions paths is more trouble than its worth. For example:

#71249 (comment)

jkotas · 2026-06-08T18:20:19Z

+                }
+                finally
+                {
+                    // Clear before returning: the trained dictionary is derived from caller-supplied samples.


This clearing is expensive and unnecessary. We are not doing it anywhere else in similar situations

alinpahontu2912 requested review from a team and Copilot June 8, 2026 12:30

Copilot started reviewing on behalf of alinpahontu2912 June 8, 2026 12:30 View session

github-actions Bot added the area-System.Collections label Jun 8, 2026

dotnet-policy-service Bot assigned alinpahontu2912 Jun 8, 2026

Copilot AI reviewed Jun 8, 2026

View reviewed changes

Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/Zstandard/ZstandardDictionary.cs

Comment on lines +152 to +153

// Clear before returning: the trained dictionary is derived from caller-supplied samples.

ArrayPool<byte>.Shared.Return(dictionaryBuffer, clearArray: true);

rzikm approved these changes Jun 8, 2026

View reviewed changes

MihaZupan reviewed Jun 8, 2026

View reviewed changes

This was referenced Jun 8, 2026

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

browser-wasm linux Release LibraryTests queues timing out #117974

Open

jkotas reviewed Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pool the dictionary buffer when training a Zstandard dictionary#129125

Pool the dictionary buffer when training a Zstandard dictionary#129125
alinpahontu2912 wants to merge 1 commit into
dotnet:mainfrom
alinpahontu2912:alin/zstd-dictionary-pool

alinpahontu2912 commented Jun 8, 2026

Uh oh!

dotnet-policy-service Bot commented Jun 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

MihaZupan Jun 8, 2026

Uh oh!

MihaZupan Jun 8, 2026

Uh oh!

AustinWise Jun 8, 2026

Uh oh!

jkotas Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

		// Clear before returning: the trained dictionary is derived from caller-supplied samples.
		ArrayPool<byte>.Shared.Return(dictionaryBuffer, clearArray: true);

Conversation

alinpahontu2912 commented Jun 8, 2026

Why dictionary sizes are LOH-heavy

Empirical evidence

Reproducer

Uh oh!

dotnet-policy-service Bot commented Jun 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

MihaZupan Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

MihaZupan Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

AustinWise Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

jkotas Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants