|
| 1 | +# Plan: Rewrite Offset Model to Match NumPy's `base_offset + strides` Architecture |
| 2 | + |
| 3 | +## Status: Investigation Required |
| 4 | + |
| 5 | +This document is the entry point for planning the rewrite of NumSharp's view/offset resolution system to match NumPy's architecture exactly. It identifies what needs to change, what needs to be investigated first, and the risks involved. |
| 6 | + |
| 7 | +## Problem Statement |
| 8 | + |
| 9 | +NumSharp uses a chain-based offset model (ViewInfo + BroadcastInfo) to resolve memory offsets for sliced and broadcast arrays. NumPy uses a flat model: `base_offset + sum(stride[i] * coord[i])`. NumSharp's model is the root cause of most broadcast/slice bugs and creates O(chain_depth) overhead per element access. |
| 10 | + |
| 11 | +### NumPy's Model (target) |
| 12 | + |
| 13 | +Every ndarray view stores exactly: |
| 14 | +``` |
| 15 | +data → pointer to first element of THIS view (not the allocation base) |
| 16 | +shape → int[ndim] dimensions |
| 17 | +strides → int[ndim] bytes per step (0 for broadcast, negative for reversed) |
| 18 | +base → reference to parent array (for memory management / refcounting only) |
| 19 | +``` |
| 20 | + |
| 21 | +Offset computation: `byte_offset = sum(strides[i] * coords[i])` — one loop, no branching, no chain resolution. |
| 22 | + |
| 23 | +When creating a slice `a[2:8:2, ::-1]`: |
| 24 | +```python |
| 25 | +new.data = a.data + 2*a.strides[0] + (a.shape[1]-1)*a.strides[1] |
| 26 | +new.shape = [3, original_cols] |
| 27 | +new.strides = [a.strides[0]*2, -a.strides[1]] |
| 28 | +new.base = a.base or a |
| 29 | +``` |
| 30 | + |
| 31 | +When broadcasting `broadcast_to(a, target_shape)`: |
| 32 | +```python |
| 33 | +new.data = a.data |
| 34 | +new.shape = target_shape |
| 35 | +new.strides = [0 if dim was stretched, else a.strides[i]] |
| 36 | +new.base = a |
| 37 | +``` |
| 38 | + |
| 39 | +Both operations compose naturally: slicing a broadcast produces correct strides via arithmetic on the existing strides. No special cases. |
| 40 | + |
| 41 | +### NumSharp's Current Model (to be replaced) |
| 42 | + |
| 43 | +``` |
| 44 | +Shape |
| 45 | +├── dimensions[] |
| 46 | +├── strides[] ← may be broadcast strides (0) OR original strides |
| 47 | +├── ViewInfo ← chain of slice history |
| 48 | +│ ├── OriginalShape ← the unsliced root shape |
| 49 | +│ ├── ParentShape ← intermediate sliced shape (for recursive slicing) |
| 50 | +│ ├── Slices[] ← SliceDef per dimension (start, stop, step) |
| 51 | +│ └── UnreducedShape ← shape before dimension reduction |
| 52 | +└── BroadcastInfo ← broadcast history |
| 53 | + ├── OriginalShape ← shape before broadcasting |
| 54 | + └── UnreducedBroadcastedShape ← lazy-loaded resolved shape |
| 55 | +``` |
| 56 | + |
| 57 | +Offset computation: 6+ code paths in `GetOffset`, `GetOffset_1D`, `GetOffset_broadcasted`, `GetOffset_broadcasted_1D`, `GetOffset_IgnoreViewInfo`, `resolveUnreducedBroadcastedShape`, plus recursive calls through `ParentShape.GetOffset`. |
| 58 | + |
| 59 | +## Target Architecture |
| 60 | + |
| 61 | +### Shape (after rewrite) |
| 62 | + |
| 63 | +```csharp |
| 64 | +public struct Shape |
| 65 | +{ |
| 66 | + internal int[] dimensions; |
| 67 | + internal int[] strides; // fully resolved — absorbs slicing, broadcasting |
| 68 | + internal int base_offset; // offset to first element within InternalArray |
| 69 | + internal int size; // product of dimensions |
| 70 | + // ViewInfo and BroadcastInfo: REMOVED |
| 71 | + // IsSliced, IsBroadcasted: derived from strides (stride=0 → broadcast) |
| 72 | +} |
| 73 | +``` |
| 74 | + |
| 75 | +### GetOffset (after rewrite) |
| 76 | + |
| 77 | +```csharp |
| 78 | +public int GetOffset(params int[] coords) |
| 79 | +{ |
| 80 | + int offset = base_offset; |
| 81 | + for (int i = 0; i < coords.Length; i++) |
| 82 | + offset += strides[i] * coords[i]; |
| 83 | + return offset; |
| 84 | +} |
| 85 | +``` |
| 86 | + |
| 87 | +### View Creation (after rewrite) |
| 88 | + |
| 89 | +```csharp |
| 90 | +// Slicing: a[start:stop:step] along axis |
| 91 | +new_base_offset = old_base_offset + start * old_strides[axis]; |
| 92 | +new_strides[axis] = old_strides[axis] * step; |
| 93 | +new_dimensions[axis] = (stop - start + step - 1) / step; // ceiling division |
| 94 | +
|
| 95 | +// Negative step (reversal): a[::-1] along axis |
| 96 | +new_base_offset = old_base_offset + (old_dimensions[axis] - 1) * old_strides[axis]; |
| 97 | +new_strides[axis] = -old_strides[axis]; |
| 98 | + |
| 99 | +// Broadcasting: stretch dim from 1 to N |
| 100 | +new_strides[axis] = 0; // that's it |
| 101 | +new_dimensions[axis] = N; |
| 102 | + |
| 103 | +// Index reduction: a[3] along axis — removes dimension, adjusts base_offset |
| 104 | +new_base_offset = old_base_offset + 3 * old_strides[axis]; |
| 105 | +// remove axis from dimensions[] and strides[] |
| 106 | +``` |
| 107 | + |
| 108 | +## Investigation Checklist |
| 109 | + |
| 110 | +Before writing a full implementation plan, the following must be investigated: |
| 111 | + |
| 112 | +### 1. Catalog all Shape consumers |
| 113 | + |
| 114 | +Every place that reads `ViewInfo`, `BroadcastInfo`, `IsSliced`, `IsBroadcasted`, `IsRecursive` needs to be identified and mapped to the new model. |
| 115 | + |
| 116 | +- [ ] `Shape.cs` — GetOffset variants, Slice(), TransformOffset, GetCoordinates, Clean, Clone |
| 117 | +- [ ] `Shape.Unmanaged.cs` — unmanaged pointer access paths |
| 118 | +- [ ] `Shape.Reshaping.cs` — reshape on sliced/broadcast views |
| 119 | +- [ ] `NDIterator.cs` — iteration with AutoReset for broadcasting |
| 120 | +- [ ] `MultiIterator.cs` — paired/broadcast iteration |
| 121 | +- [ ] `UnmanagedStorage.Slicing.cs` — GetViewInternal, the Bug 17 materializing fix |
| 122 | +- [ ] `UnmanagedStorage.Getters.cs` — element access |
| 123 | +- [ ] `Default.Broadcasting.cs` — Broadcast() creates ViewInfo/BroadcastInfo |
| 124 | +- [ ] Generated math templates (`Default.Add.*.cs`, `Default.Subtract.*.cs`, etc.) — ~24 files, ~200K lines that reference ViewInfo/BroadcastInfo for fast-path decisions |
| 125 | +- [ ] `NDArray.Indexing.cs` — indexing dispatch |
| 126 | +- [ ] Selection/masking code |
| 127 | +- [ ] `np.reshape.cs` — reshape interacts with views |
| 128 | + |
| 129 | +### 2. Understand IArraySlice bounds checking |
| 130 | + |
| 131 | +NumPy adjusts the `data` pointer to point at the first view element. NumSharp uses `IArraySlice` with bounds-checked access via `Count`. Questions: |
| 132 | + |
| 133 | +- [ ] Can `base_offset` be negative? (yes, for reversed views the first logical element may precede the parent's first element in memory — NumPy handles this with pointer arithmetic, NumSharp would need signed offset) |
| 134 | +- [ ] Does `IArraySlice` support negative indexing or must we use raw pointers? |
| 135 | +- [ ] Should we keep `IArraySlice` bounds checking (safety) or move to unchecked pointer access (performance)? |
| 136 | +- [ ] How does `InternalArray.Address` interact with offset-based access? |
| 137 | + |
| 138 | +### 3. Understand NDIterator's relationship to strides |
| 139 | + |
| 140 | +NDIterator has fast paths for contiguous arrays and slow paths for sliced/broadcast. Questions: |
| 141 | + |
| 142 | +- [ ] With the new model, is NDIterator still needed or can it be simplified to `pointer + stride` walking? |
| 143 | +- [ ] Does AutoReset (for broadcast smaller-into-larger iteration) still work with flat strides? |
| 144 | +- [ ] What is the performance impact of removing the contiguous fast path (since all arrays now use the same stride-based access)? |
| 145 | + |
| 146 | +### 4. Understand reshape-after-slice interactions |
| 147 | + |
| 148 | +`np.reshape(sliced_view)` in NumPy either returns a view (if the slice is contiguous) or copies. NumSharp's `IsRecursive` flag and `ParentShape` chain handle this. Questions: |
| 149 | + |
| 150 | +- [ ] Can reshape on a non-contiguous view be handled with just `base_offset + strides`? |
| 151 | +- [ ] When must reshape force a copy? (NumPy: when the data is not contiguous in the requested order) |
| 152 | +- [ ] How to detect contiguity from strides alone? (check: `strides[i] == strides[i+1] * dimensions[i+1]` for all i) |
| 153 | + |
| 154 | +### 5. Understand generated template code |
| 155 | + |
| 156 | +The `Regen` templates generate type-specific math code that checks `IsBroadcasted` / `IsSliced` for fast-path branching. |
| 157 | + |
| 158 | +- [ ] What does the template source look like? (`Default.Op.General.template.cs`) |
| 159 | +- [ ] Can the fast-path decisions be made from strides alone? (e.g., contiguous = strides match row-major; broadcast = any stride is 0) |
| 160 | +- [ ] How to regenerate after changes? |
| 161 | + |
| 162 | +### 6. Map `IsSliced` / `IsBroadcasted` to stride-based checks |
| 163 | + |
| 164 | +Current flags: |
| 165 | +- `IsSliced` → `ViewInfo != null` |
| 166 | +- `IsBroadcasted` → `BroadcastInfo != null` |
| 167 | +- `IsContiguous` → complex check involving ViewInfo |
| 168 | + |
| 169 | +New derivations: |
| 170 | +- `IsBroadcasted` → `any(strides[i] == 0 && dimensions[i] > 1)` |
| 171 | +- `IsContiguous` → `strides == row_major_strides(dimensions)` (can precompute) |
| 172 | +- `IsSliced` → no longer a meaningful concept; all views are just `base_offset + strides` |
| 173 | + |
| 174 | +### 7. Memory management / aliasing |
| 175 | + |
| 176 | +- [ ] How does NumSharp track that a view shares memory with a parent? (`InternalArray` reference) |
| 177 | +- [ ] Does `base_offset` change anything about GC pinning or unmanaged memory lifetime? |
| 178 | +- [ ] The `Alias()` method creates views — confirm it can work with `base_offset` |
| 179 | + |
| 180 | +## Risk Assessment |
| 181 | + |
| 182 | +| Risk | Severity | Mitigation | |
| 183 | +|------|----------|------------| |
| 184 | +| Breaking the 1601 passing tests | High | Incremental approach — keep old model behind a flag initially | |
| 185 | +| Generated template code (~200K lines) | High | Must understand Regen before touching templates | |
| 186 | +| NDIterator redesign cascade | Medium | May need to rewrite iteration entirely | |
| 187 | +| Performance regression | Medium | Benchmark before/after; the new model should be faster (fewer branches) | |
| 188 | +| Reshape-after-slice edge cases | Medium | Port NumPy's contiguity check exactly | |
| 189 | +| Negative base_offset for reversed views | Medium | Verify IArraySlice can handle it or use pointer arithmetic | |
| 190 | + |
| 191 | +## Suggested Investigation Approach |
| 192 | + |
| 193 | +1. Start with investigation items 1-3 above — catalog every consumer and understand the constraints |
| 194 | +2. Build a prototype `Shape2` struct with `base_offset + strides` alongside the existing `Shape` |
| 195 | +3. Add a `ToShape2()` conversion and verify offset parity on all test cases |
| 196 | +4. Once parity is confirmed, plan the incremental migration file by file |
| 197 | + |
| 198 | +## NumPy Reference Files |
| 199 | + |
| 200 | +These files in `src/numpy/` contain the authoritative implementation: |
| 201 | + |
| 202 | +| File | What to study | |
| 203 | +|------|---------------| |
| 204 | +| `numpy/_core/include/numpy/ndarraytypes.h` | `PyArrayObject` struct — `data`, `dimensions`, `strides`, `base` | |
| 205 | +| `numpy/_core/src/multiarray/getitem.c` | Element access via strides | |
| 206 | +| `numpy/_core/src/multiarray/mapping.c` | Slice/index view creation, stride computation | |
| 207 | +| `numpy/_core/src/multiarray/ctors.c` | Array construction, contiguity checks | |
| 208 | +| `numpy/_core/src/multiarray/shape.c` | Reshape — when to copy vs view | |
| 209 | +| `numpy/_core/src/multiarray/nditer_constr.c` | Iterator setup, stride=0 for broadcast | |
| 210 | +| `numpy/lib/_stride_tricks_impl.py` | `as_strided`, `broadcast_to`, `broadcast_arrays` | |
0 commit comments