Skip to content

Commit 93c365d

Browse files
committed
docs: add offset-model-rewrite investigation plan
Entry point for planning the rewrite of NumSharp's view/offset resolution to match NumPy's base_offset + strides architecture. Current model: ViewInfo + BroadcastInfo chains with 6+ GetOffset code paths, recursive ParentShape resolution, and complex lazy-loaded UnreducedBroadcastedShape computation. Target model: base_offset (int) + strides[] + dimensions[]. Offset computation becomes a single loop: sum(stride[i] * coord[i]). All slice/broadcast/transpose operations just adjust the base_offset and strides — no chains, no special cases. The plan covers: - Investigation checklist (12 items): catalog all consumers, understand IArraySlice bounds checking, NDIterator relationship, reshape-after- slice interactions, generated template code, IsSliced/IsBroadcasted derivation from strides, memory management - Risk assessment with mitigations - Suggested incremental approach (prototype Shape2, verify parity, migrate) - NumPy reference files in src/numpy/ for each subsystem
1 parent 7c2f085 commit 93c365d

File tree

1 file changed

+210
-0
lines changed

1 file changed

+210
-0
lines changed

docs/plans/offset-model-rewrite.md

Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
# Plan: Rewrite Offset Model to Match NumPy's `base_offset + strides` Architecture
2+
3+
## Status: Investigation Required
4+
5+
This document is the entry point for planning the rewrite of NumSharp's view/offset resolution system to match NumPy's architecture exactly. It identifies what needs to change, what needs to be investigated first, and the risks involved.
6+
7+
## Problem Statement
8+
9+
NumSharp uses a chain-based offset model (ViewInfo + BroadcastInfo) to resolve memory offsets for sliced and broadcast arrays. NumPy uses a flat model: `base_offset + sum(stride[i] * coord[i])`. NumSharp's model is the root cause of most broadcast/slice bugs and creates O(chain_depth) overhead per element access.
10+
11+
### NumPy's Model (target)
12+
13+
Every ndarray view stores exactly:
14+
```
15+
data → pointer to first element of THIS view (not the allocation base)
16+
shape → int[ndim] dimensions
17+
strides → int[ndim] bytes per step (0 for broadcast, negative for reversed)
18+
base → reference to parent array (for memory management / refcounting only)
19+
```
20+
21+
Offset computation: `byte_offset = sum(strides[i] * coords[i])` — one loop, no branching, no chain resolution.
22+
23+
When creating a slice `a[2:8:2, ::-1]`:
24+
```python
25+
new.data = a.data + 2*a.strides[0] + (a.shape[1]-1)*a.strides[1]
26+
new.shape = [3, original_cols]
27+
new.strides = [a.strides[0]*2, -a.strides[1]]
28+
new.base = a.base or a
29+
```
30+
31+
When broadcasting `broadcast_to(a, target_shape)`:
32+
```python
33+
new.data = a.data
34+
new.shape = target_shape
35+
new.strides = [0 if dim was stretched, else a.strides[i]]
36+
new.base = a
37+
```
38+
39+
Both operations compose naturally: slicing a broadcast produces correct strides via arithmetic on the existing strides. No special cases.
40+
41+
### NumSharp's Current Model (to be replaced)
42+
43+
```
44+
Shape
45+
├── dimensions[]
46+
├── strides[] ← may be broadcast strides (0) OR original strides
47+
├── ViewInfo ← chain of slice history
48+
│ ├── OriginalShape ← the unsliced root shape
49+
│ ├── ParentShape ← intermediate sliced shape (for recursive slicing)
50+
│ ├── Slices[] ← SliceDef per dimension (start, stop, step)
51+
│ └── UnreducedShape ← shape before dimension reduction
52+
└── BroadcastInfo ← broadcast history
53+
├── OriginalShape ← shape before broadcasting
54+
└── UnreducedBroadcastedShape ← lazy-loaded resolved shape
55+
```
56+
57+
Offset computation: 6+ code paths in `GetOffset`, `GetOffset_1D`, `GetOffset_broadcasted`, `GetOffset_broadcasted_1D`, `GetOffset_IgnoreViewInfo`, `resolveUnreducedBroadcastedShape`, plus recursive calls through `ParentShape.GetOffset`.
58+
59+
## Target Architecture
60+
61+
### Shape (after rewrite)
62+
63+
```csharp
64+
public struct Shape
65+
{
66+
internal int[] dimensions;
67+
internal int[] strides; // fully resolved — absorbs slicing, broadcasting
68+
internal int base_offset; // offset to first element within InternalArray
69+
internal int size; // product of dimensions
70+
// ViewInfo and BroadcastInfo: REMOVED
71+
// IsSliced, IsBroadcasted: derived from strides (stride=0 → broadcast)
72+
}
73+
```
74+
75+
### GetOffset (after rewrite)
76+
77+
```csharp
78+
public int GetOffset(params int[] coords)
79+
{
80+
int offset = base_offset;
81+
for (int i = 0; i < coords.Length; i++)
82+
offset += strides[i] * coords[i];
83+
return offset;
84+
}
85+
```
86+
87+
### View Creation (after rewrite)
88+
89+
```csharp
90+
// Slicing: a[start:stop:step] along axis
91+
new_base_offset = old_base_offset + start * old_strides[axis];
92+
new_strides[axis] = old_strides[axis] * step;
93+
new_dimensions[axis] = (stop - start + step - 1) / step; // ceiling division
94+
95+
// Negative step (reversal): a[::-1] along axis
96+
new_base_offset = old_base_offset + (old_dimensions[axis] - 1) * old_strides[axis];
97+
new_strides[axis] = -old_strides[axis];
98+
99+
// Broadcasting: stretch dim from 1 to N
100+
new_strides[axis] = 0; // that's it
101+
new_dimensions[axis] = N;
102+
103+
// Index reduction: a[3] along axis — removes dimension, adjusts base_offset
104+
new_base_offset = old_base_offset + 3 * old_strides[axis];
105+
// remove axis from dimensions[] and strides[]
106+
```
107+
108+
## Investigation Checklist
109+
110+
Before writing a full implementation plan, the following must be investigated:
111+
112+
### 1. Catalog all Shape consumers
113+
114+
Every place that reads `ViewInfo`, `BroadcastInfo`, `IsSliced`, `IsBroadcasted`, `IsRecursive` needs to be identified and mapped to the new model.
115+
116+
- [ ] `Shape.cs` — GetOffset variants, Slice(), TransformOffset, GetCoordinates, Clean, Clone
117+
- [ ] `Shape.Unmanaged.cs` — unmanaged pointer access paths
118+
- [ ] `Shape.Reshaping.cs` — reshape on sliced/broadcast views
119+
- [ ] `NDIterator.cs` — iteration with AutoReset for broadcasting
120+
- [ ] `MultiIterator.cs` — paired/broadcast iteration
121+
- [ ] `UnmanagedStorage.Slicing.cs` — GetViewInternal, the Bug 17 materializing fix
122+
- [ ] `UnmanagedStorage.Getters.cs` — element access
123+
- [ ] `Default.Broadcasting.cs` — Broadcast() creates ViewInfo/BroadcastInfo
124+
- [ ] Generated math templates (`Default.Add.*.cs`, `Default.Subtract.*.cs`, etc.) — ~24 files, ~200K lines that reference ViewInfo/BroadcastInfo for fast-path decisions
125+
- [ ] `NDArray.Indexing.cs` — indexing dispatch
126+
- [ ] Selection/masking code
127+
- [ ] `np.reshape.cs` — reshape interacts with views
128+
129+
### 2. Understand IArraySlice bounds checking
130+
131+
NumPy adjusts the `data` pointer to point at the first view element. NumSharp uses `IArraySlice` with bounds-checked access via `Count`. Questions:
132+
133+
- [ ] Can `base_offset` be negative? (yes, for reversed views the first logical element may precede the parent's first element in memory — NumPy handles this with pointer arithmetic, NumSharp would need signed offset)
134+
- [ ] Does `IArraySlice` support negative indexing or must we use raw pointers?
135+
- [ ] Should we keep `IArraySlice` bounds checking (safety) or move to unchecked pointer access (performance)?
136+
- [ ] How does `InternalArray.Address` interact with offset-based access?
137+
138+
### 3. Understand NDIterator's relationship to strides
139+
140+
NDIterator has fast paths for contiguous arrays and slow paths for sliced/broadcast. Questions:
141+
142+
- [ ] With the new model, is NDIterator still needed or can it be simplified to `pointer + stride` walking?
143+
- [ ] Does AutoReset (for broadcast smaller-into-larger iteration) still work with flat strides?
144+
- [ ] What is the performance impact of removing the contiguous fast path (since all arrays now use the same stride-based access)?
145+
146+
### 4. Understand reshape-after-slice interactions
147+
148+
`np.reshape(sliced_view)` in NumPy either returns a view (if the slice is contiguous) or copies. NumSharp's `IsRecursive` flag and `ParentShape` chain handle this. Questions:
149+
150+
- [ ] Can reshape on a non-contiguous view be handled with just `base_offset + strides`?
151+
- [ ] When must reshape force a copy? (NumPy: when the data is not contiguous in the requested order)
152+
- [ ] How to detect contiguity from strides alone? (check: `strides[i] == strides[i+1] * dimensions[i+1]` for all i)
153+
154+
### 5. Understand generated template code
155+
156+
The `Regen` templates generate type-specific math code that checks `IsBroadcasted` / `IsSliced` for fast-path branching.
157+
158+
- [ ] What does the template source look like? (`Default.Op.General.template.cs`)
159+
- [ ] Can the fast-path decisions be made from strides alone? (e.g., contiguous = strides match row-major; broadcast = any stride is 0)
160+
- [ ] How to regenerate after changes?
161+
162+
### 6. Map `IsSliced` / `IsBroadcasted` to stride-based checks
163+
164+
Current flags:
165+
- `IsSliced``ViewInfo != null`
166+
- `IsBroadcasted``BroadcastInfo != null`
167+
- `IsContiguous` → complex check involving ViewInfo
168+
169+
New derivations:
170+
- `IsBroadcasted``any(strides[i] == 0 && dimensions[i] > 1)`
171+
- `IsContiguous``strides == row_major_strides(dimensions)` (can precompute)
172+
- `IsSliced` → no longer a meaningful concept; all views are just `base_offset + strides`
173+
174+
### 7. Memory management / aliasing
175+
176+
- [ ] How does NumSharp track that a view shares memory with a parent? (`InternalArray` reference)
177+
- [ ] Does `base_offset` change anything about GC pinning or unmanaged memory lifetime?
178+
- [ ] The `Alias()` method creates views — confirm it can work with `base_offset`
179+
180+
## Risk Assessment
181+
182+
| Risk | Severity | Mitigation |
183+
|------|----------|------------|
184+
| Breaking the 1601 passing tests | High | Incremental approach — keep old model behind a flag initially |
185+
| Generated template code (~200K lines) | High | Must understand Regen before touching templates |
186+
| NDIterator redesign cascade | Medium | May need to rewrite iteration entirely |
187+
| Performance regression | Medium | Benchmark before/after; the new model should be faster (fewer branches) |
188+
| Reshape-after-slice edge cases | Medium | Port NumPy's contiguity check exactly |
189+
| Negative base_offset for reversed views | Medium | Verify IArraySlice can handle it or use pointer arithmetic |
190+
191+
## Suggested Investigation Approach
192+
193+
1. Start with investigation items 1-3 above — catalog every consumer and understand the constraints
194+
2. Build a prototype `Shape2` struct with `base_offset + strides` alongside the existing `Shape`
195+
3. Add a `ToShape2()` conversion and verify offset parity on all test cases
196+
4. Once parity is confirmed, plan the incremental migration file by file
197+
198+
## NumPy Reference Files
199+
200+
These files in `src/numpy/` contain the authoritative implementation:
201+
202+
| File | What to study |
203+
|------|---------------|
204+
| `numpy/_core/include/numpy/ndarraytypes.h` | `PyArrayObject` struct — `data`, `dimensions`, `strides`, `base` |
205+
| `numpy/_core/src/multiarray/getitem.c` | Element access via strides |
206+
| `numpy/_core/src/multiarray/mapping.c` | Slice/index view creation, stride computation |
207+
| `numpy/_core/src/multiarray/ctors.c` | Array construction, contiguity checks |
208+
| `numpy/_core/src/multiarray/shape.c` | Reshape — when to copy vs view |
209+
| `numpy/_core/src/multiarray/nditer_constr.c` | Iterator setup, stride=0 for broadcast |
210+
| `numpy/lib/_stride_tricks_impl.py` | `as_strided`, `broadcast_to`, `broadcast_arrays` |

0 commit comments

Comments
 (0)