Skip to content

Intermittent AccessViolation (unmanaged-storage lifetime race) — shift/bitwise kernel is the victim, not the cause #615

Description

@Nucs

Overview

An intermittent AccessViolationException (access to unmapped memory) crashes the entire in-process benchmark suite. It surfaced during a full benchmark/run_benchmark.py run inside BitwiseBenchmarks.LeftShift()DefaultEngine.SimdScalarShiftDispatch<int> → the emitted IL_ShiftLeft_Scalar_Int32 kernel, but extensive testing proves the shift/bitwise kernels are the victim, not the cause. Because the official config uses the InProcessEmitToolchain (one process per suite), this single fault wiped the entire Bitwise suite (81 op×dtype×N cells) from the results. This is the long-documented "known intermittent AccessViolation, an unmanaged-storage lifetime bug."

Actual (the crash)

Fatal error.
System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at DynamicClass.IL_ShiftLeft_Scalar_Int32(Int32*, Int32*, Int32, Int64)
   at NumSharp.Backends.DefaultEngine.SimdScalarShiftDispatch[[System.Int32, ...]](NumSharp.NDArray, NumSharp.NDArray, Int32, Boolean)
   at NumSharp.Benchmark.CSharp.Benchmarks.Bitwise.BitwiseBenchmarks.LeftShift()

It crashed on np.left_shift(boolArray[100000], 2) after ~6,000 successful identical calls in the same benchmark case (BenchmarkDotNet logged WorkloadActual 11..15 then Fatal error).

Expected

No crash. np.left_shift(bool[100000], 2) must reliably return int32 [0|4].

Reproduction & investigation

The bug does not reproduce in isolation. I ran ~13M faithful operations and could not trigger it:

Test Ops Result
Isolated left_shift(bool[100k], 2) 80,000 clean
Full bitwise matrix (10 dtypes × 6 ops) 1,180,000 clean
Bitwise matrix + NUMSHARP_GUARD_PAGES=1, N=1k & 100k ~24,000 no OOB
Bitwise matrix + guard pages, N=10M ~700 no OOB
Faithful randint().astype() setup + ops + guard pages, all 3 sizes ~9,700 no OOB

The guard-page diagnostic (SizeBucketedBufferPool.GuardPagesEnabled, which right-aligns every buffer against a PAGE_NOACCESS page so any 1-past-the-end access faults at the kernel) found zero out-of-bounds accesses across the entire bitwise surface (setup casts + all six ops × all integer dtypes × all three sizes).

Root-cause hypothesis

The crash is an AccessViolation (unmapped memory), not wrong data, and the guard-page overrun detector came up empty. Together that points away from a buffer overrun and toward an unmanaged-storage lifetime bug — a NativeMemory/pooled buffer freed while a kernel is still using it (use-after-free under GC pressure). Most likely shapes:

  1. A raw .Address pointer extracted from a managed NDArray and passed to an emitted kernel without a keep-alive — the temp (e.g. the astype(bool→int32) widening result in SimdScalarShiftDispatch) gets collected mid-kernel and its finalizer frees the unmanaged buffer. (Backends/Default/Math/Default.Shift.cs, Backends/Kernels/Direct/DirectILKernelGenerator.Shift.cs.)
  2. A lifetime/free-list race in the custom unmanaged pool (Backends/Unmanaged/Pooling/SizeBucketedBufferPool.cs, StackedMemoryPool.cs) + finalizer-driven NDArray reclamation.

It is rare because it needs the GC + finalizer to fire inside a narrow kernel window. Note: the shift path is just where it happened to land — any raw-pointer kernel dispatch is a candidate corruptor/victim.

Impact

  • The InProcessEmitToolchain makes one AccessViolation fatal to the whole suite — a single rare fault silently drops ~81 result cells. The benchmark harness (NDIter sheet) already tolerates this as NA/IGNORED; the op-matrix suites do not.

Workaround

  • Re-running the affected suite almost always succeeds (the fault is rare). The 06-29 benchmark snapshot's bitwise data was recovered this way.
  • NUMSHARP_GUARD_PAGES=1 (Windows) is the built-in localiser — but only for OOB writes; it will not catch this lifetime race.

Suggested next steps

  • Audit raw-pointer kernel dispatch sites that do byte* p = (byte*)nd.Address; kernel(p, …) for missing GC.KeepAlive(nd) after the kernel call (start with SimdScalarShiftDispatch and the astype-temp paths).
  • Build a GC-stress reproduction harness: deliberately unroot the kernel's source NDArray and hammer GC.Collect() + WaitForPendingFinalizers() on a background thread to force the use-after-free.
  • Consider poisoning freed pool buffers (fill-on-free + assert-on-reuse) under a diagnostic flag to catch use-after-free distinct from OOB.

Diagnosed on branch nditer @ 2d16f477, .NET 10.0.101 / Release, i9-13900K (AVX2).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions