Overview
An intermittent AccessViolationException (access to unmapped memory) crashes the entire in-process benchmark suite. It surfaced during a full benchmark/run_benchmark.py run inside BitwiseBenchmarks.LeftShift() → DefaultEngine.SimdScalarShiftDispatch<int> → the emitted IL_ShiftLeft_Scalar_Int32 kernel, but extensive testing proves the shift/bitwise kernels are the victim, not the cause. Because the official config uses the InProcessEmitToolchain (one process per suite), this single fault wiped the entire Bitwise suite (81 op×dtype×N cells) from the results. This is the long-documented "known intermittent AccessViolation, an unmanaged-storage lifetime bug."
Actual (the crash)
Fatal error.
System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
at DynamicClass.IL_ShiftLeft_Scalar_Int32(Int32*, Int32*, Int32, Int64)
at NumSharp.Backends.DefaultEngine.SimdScalarShiftDispatch[[System.Int32, ...]](NumSharp.NDArray, NumSharp.NDArray, Int32, Boolean)
at NumSharp.Benchmark.CSharp.Benchmarks.Bitwise.BitwiseBenchmarks.LeftShift()
It crashed on np.left_shift(boolArray[100000], 2) after ~6,000 successful identical calls in the same benchmark case (BenchmarkDotNet logged WorkloadActual 11..15 then Fatal error).
Expected
No crash. np.left_shift(bool[100000], 2) must reliably return int32 [0|4].
Reproduction & investigation
The bug does not reproduce in isolation. I ran ~13M faithful operations and could not trigger it:
| Test |
Ops |
Result |
Isolated left_shift(bool[100k], 2) |
80,000 |
clean |
| Full bitwise matrix (10 dtypes × 6 ops) |
1,180,000 |
clean |
Bitwise matrix + NUMSHARP_GUARD_PAGES=1, N=1k & 100k |
~24,000 |
no OOB |
| Bitwise matrix + guard pages, N=10M |
~700 |
no OOB |
Faithful randint().astype() setup + ops + guard pages, all 3 sizes |
~9,700 |
no OOB |
The guard-page diagnostic (SizeBucketedBufferPool.GuardPagesEnabled, which right-aligns every buffer against a PAGE_NOACCESS page so any 1-past-the-end access faults at the kernel) found zero out-of-bounds accesses across the entire bitwise surface (setup casts + all six ops × all integer dtypes × all three sizes).
Root-cause hypothesis
The crash is an AccessViolation (unmapped memory), not wrong data, and the guard-page overrun detector came up empty. Together that points away from a buffer overrun and toward an unmanaged-storage lifetime bug — a NativeMemory/pooled buffer freed while a kernel is still using it (use-after-free under GC pressure). Most likely shapes:
- A raw
.Address pointer extracted from a managed NDArray and passed to an emitted kernel without a keep-alive — the temp (e.g. the astype(bool→int32) widening result in SimdScalarShiftDispatch) gets collected mid-kernel and its finalizer frees the unmanaged buffer. (Backends/Default/Math/Default.Shift.cs, Backends/Kernels/Direct/DirectILKernelGenerator.Shift.cs.)
- A lifetime/free-list race in the custom unmanaged pool (
Backends/Unmanaged/Pooling/SizeBucketedBufferPool.cs, StackedMemoryPool.cs) + finalizer-driven NDArray reclamation.
It is rare because it needs the GC + finalizer to fire inside a narrow kernel window. Note: the shift path is just where it happened to land — any raw-pointer kernel dispatch is a candidate corruptor/victim.
Impact
- The
InProcessEmitToolchain makes one AccessViolation fatal to the whole suite — a single rare fault silently drops ~81 result cells. The benchmark harness (NDIter sheet) already tolerates this as NA/IGNORED; the op-matrix suites do not.
Workaround
- Re-running the affected suite almost always succeeds (the fault is rare). The 06-29 benchmark snapshot's bitwise data was recovered this way.
NUMSHARP_GUARD_PAGES=1 (Windows) is the built-in localiser — but only for OOB writes; it will not catch this lifetime race.
Suggested next steps
Diagnosed on branch nditer @ 2d16f477, .NET 10.0.101 / Release, i9-13900K (AVX2).
Overview
An intermittent
AccessViolationException(access to unmapped memory) crashes the entire in-process benchmark suite. It surfaced during a fullbenchmark/run_benchmark.pyrun insideBitwiseBenchmarks.LeftShift()→DefaultEngine.SimdScalarShiftDispatch<int>→ the emittedIL_ShiftLeft_Scalar_Int32kernel, but extensive testing proves the shift/bitwise kernels are the victim, not the cause. Because the official config uses theInProcessEmitToolchain(one process per suite), this single fault wiped the entire Bitwise suite (81 op×dtype×N cells) from the results. This is the long-documented "known intermittent AccessViolation, an unmanaged-storage lifetime bug."Actual (the crash)
It crashed on
np.left_shift(boolArray[100000], 2)after ~6,000 successful identical calls in the same benchmark case (BenchmarkDotNet loggedWorkloadActual 11..15thenFatal error).Expected
No crash.
np.left_shift(bool[100000], 2)must reliably returnint32[0|4].Reproduction & investigation
The bug does not reproduce in isolation. I ran ~13M faithful operations and could not trigger it:
left_shift(bool[100k], 2)NUMSHARP_GUARD_PAGES=1, N=1k & 100krandint().astype()setup + ops + guard pages, all 3 sizesThe guard-page diagnostic (
SizeBucketedBufferPool.GuardPagesEnabled, which right-aligns every buffer against aPAGE_NOACCESSpage so any 1-past-the-end access faults at the kernel) found zero out-of-bounds accesses across the entire bitwise surface (setup casts + all six ops × all integer dtypes × all three sizes).Root-cause hypothesis
The crash is an AccessViolation (unmapped memory), not wrong data, and the guard-page overrun detector came up empty. Together that points away from a buffer overrun and toward an unmanaged-storage lifetime bug — a
NativeMemory/pooled buffer freed while a kernel is still using it (use-after-free under GC pressure). Most likely shapes:.Addresspointer extracted from a managedNDArrayand passed to an emitted kernel without a keep-alive — the temp (e.g. theastype(bool→int32)widening result inSimdScalarShiftDispatch) gets collected mid-kernel and its finalizer frees the unmanaged buffer. (Backends/Default/Math/Default.Shift.cs,Backends/Kernels/Direct/DirectILKernelGenerator.Shift.cs.)Backends/Unmanaged/Pooling/SizeBucketedBufferPool.cs,StackedMemoryPool.cs) + finalizer-drivenNDArrayreclamation.It is rare because it needs the GC + finalizer to fire inside a narrow kernel window. Note: the shift path is just where it happened to land — any raw-pointer kernel dispatch is a candidate corruptor/victim.
Impact
InProcessEmitToolchainmakes one AccessViolation fatal to the whole suite — a single rare fault silently drops ~81 result cells. The benchmark harness (NDIter sheet) already tolerates this asNA/IGNORED; the op-matrix suites do not.Workaround
NUMSHARP_GUARD_PAGES=1(Windows) is the built-in localiser — but only for OOB writes; it will not catch this lifetime race.Suggested next steps
byte* p = (byte*)nd.Address; kernel(p, …)for missingGC.KeepAlive(nd)after the kernel call (start withSimdScalarShiftDispatchand theastype-temp paths).NDArrayand hammerGC.Collect()+WaitForPendingFinalizers()on a background thread to force the use-after-free.Diagnosed on branch
nditer@2d16f477, .NET 10.0.101 / Release, i9-13900K (AVX2).