Intermittent AccessViolation (unmanaged-storage lifetime race) — shift/bitwise kernel is the victim, not the cause

## Overview

An **intermittent `AccessViolationException`** (access to *unmapped* memory) crashes the entire in-process benchmark suite. It surfaced during a full `benchmark/run_benchmark.py` run inside `BitwiseBenchmarks.LeftShift()` → `DefaultEngine.SimdScalarShiftDispatch<int>` → the emitted `IL_ShiftLeft_Scalar_Int32` kernel, but **extensive testing proves the shift/bitwise kernels are the *victim*, not the cause**. Because the official config uses the `InProcessEmitToolchain` (one process per suite), this single fault wiped the **entire Bitwise suite (81 op×dtype×N cells)** from the results. This is the long-documented *"known intermittent AccessViolation, an unmanaged-storage lifetime bug."*

## Actual (the crash)

```
Fatal error.
System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at DynamicClass.IL_ShiftLeft_Scalar_Int32(Int32*, Int32*, Int32, Int64)
   at NumSharp.Backends.DefaultEngine.SimdScalarShiftDispatch[[System.Int32, ...]](NumSharp.NDArray, NumSharp.NDArray, Int32, Boolean)
   at NumSharp.Benchmark.CSharp.Benchmarks.Bitwise.BitwiseBenchmarks.LeftShift()
```

It crashed on `np.left_shift(boolArray[100000], 2)` **after ~6,000 successful identical calls** in the same benchmark case (BenchmarkDotNet logged `WorkloadActual 11..15` then `Fatal error`).

## Expected

No crash. `np.left_shift(bool[100000], 2)` must reliably return `int32` `[0|4]`.

## Reproduction & investigation

The bug does **not** reproduce in isolation. I ran ~13M faithful operations and could not trigger it:

| Test | Ops | Result |
|---|---|---|
| Isolated `left_shift(bool[100k], 2)` | 80,000 | clean |
| Full bitwise matrix (10 dtypes × 6 ops) | 1,180,000 | clean |
| Bitwise matrix **+ `NUMSHARP_GUARD_PAGES=1`**, N=1k & 100k | ~24,000 | **no OOB** |
| Bitwise matrix **+ guard pages**, N=10M | ~700 | **no OOB** |
| Faithful `randint().astype()` setup + ops **+ guard pages**, all 3 sizes | ~9,700 | **no OOB** |

The guard-page diagnostic (`SizeBucketedBufferPool.GuardPagesEnabled`, which right-aligns every buffer against a `PAGE_NOACCESS` page so any 1-past-the-end access faults *at the kernel*) found **zero out-of-bounds accesses** across the entire bitwise surface (setup casts + all six ops × all integer dtypes × all three sizes).

## Root-cause hypothesis

The crash is an **AccessViolation (unmapped memory), not wrong data**, and the guard-page **overrun detector came up empty**. Together that points *away from a buffer overrun* and *toward an unmanaged-storage lifetime bug* — a `NativeMemory`/pooled buffer freed while a kernel is still using it (use-after-free under GC pressure). Most likely shapes:

1. A raw `.Address` pointer extracted from a managed `NDArray` and passed to an emitted kernel **without a keep-alive** — the temp (e.g. the `astype(bool→int32)` widening result in `SimdScalarShiftDispatch`) gets collected mid-kernel and its finalizer frees the unmanaged buffer. (`Backends/Default/Math/Default.Shift.cs`, `Backends/Kernels/Direct/DirectILKernelGenerator.Shift.cs`.)
2. A lifetime/free-list race in the custom unmanaged pool (`Backends/Unmanaged/Pooling/SizeBucketedBufferPool.cs`, `StackedMemoryPool.cs`) + finalizer-driven `NDArray` reclamation.

It is rare because it needs the GC + finalizer to fire inside a narrow kernel window. Note: the shift path is just where it happened to land — **any** raw-pointer kernel dispatch is a candidate corruptor/victim.

## Impact

- The `InProcessEmitToolchain` makes one AccessViolation fatal to the **whole suite** — a single rare fault silently drops ~81 result cells. The benchmark harness (NDIter sheet) already tolerates this as `NA/IGNORED`; the op-matrix suites do not.

## Workaround

- Re-running the affected suite almost always succeeds (the fault is rare). The 06-29 benchmark snapshot's bitwise data was recovered this way.
- `NUMSHARP_GUARD_PAGES=1` (Windows) is the built-in localiser — but only for OOB writes; it will **not** catch this lifetime race.

## Suggested next steps

- [ ] Audit raw-pointer kernel dispatch sites that do `byte* p = (byte*)nd.Address; kernel(p, …)` for missing `GC.KeepAlive(nd)` after the kernel call (start with `SimdScalarShiftDispatch` and the `astype`-temp paths).
- [ ] Build a GC-stress reproduction harness: deliberately unroot the kernel's source `NDArray` and hammer `GC.Collect()` + `WaitForPendingFinalizers()` on a background thread to force the use-after-free.
- [ ] Consider poisoning freed pool buffers (fill-on-free + assert-on-reuse) under a diagnostic flag to catch use-after-free distinct from OOB.

_Diagnosed on branch `nditer` @ `2d16f477`, .NET 10.0.101 / Release, i9-13900K (AVX2)._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Intermittent AccessViolation (unmanaged-storage lifetime race) — shift/bitwise kernel is the victim, not the cause #615

Overview

Actual (the crash)

Expected

Reproduction & investigation

Root-cause hypothesis

Impact

Workaround

Suggested next steps

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Test	Ops	Result
Isolated `left_shift(bool[100k], 2)`	80,000	clean
Full bitwise matrix (10 dtypes × 6 ops)	1,180,000	clean
Bitwise matrix + `NUMSHARP_GUARD_PAGES=1`, N=1k & 100k	~24,000	no OOB
Bitwise matrix + guard pages, N=10M	~700	no OOB
Faithful `randint().astype()` setup + ops + guard pages, all 3 sizes	~9,700	no OOB

Uh oh!

Intermittent AccessViolation (unmanaged-storage lifetime race) — shift/bitwise kernel is the victim, not the cause #615

Description

Overview

Actual (the crash)

Expected

Reproduction & investigation

Root-cause hypothesis

Impact

Workaround

Suggested next steps

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions