[Refactor] np.save and np.load refactor for long index support and 100% interoperable with numpy

## Overview

Refactor `np.save` and `np.load` to support long indexing (shapes > int.MaxValue) and achieve 100% interoperability with NumPy's .npy/.npz file format.

## Problem

The current implementation has several limitations:

1. **int32 shape limitation**: Uses `int[]` for shapes throughout, incorrectly assuming this is a file format constraint
2. **No async support**: Large array I/O blocks the calling thread
3. **No cancellation support**: Long operations cannot be cancelled
4. **Fixed buffer sizes**: No tuning for different workloads
5. **Version 1.0 only**: Doesn't support format versions 2.0/3.0

```csharp
// Current: forces int32 parsing
shape = header.Split(',').Select(Int32.Parse).ToArray();

// Current: no async, no cancellation
public static NDArray load(string path) { ... }
```

## Clarification: .npy Format Has NO int32 Limit

Investigation of NumPy's `numpy/lib/_format_impl.py` (v2.4.2) reveals:

- Shape values are **Python integer literals in ASCII text** (arbitrary precision)
- Header example: `{'descr': '<f8', 'fortran_order': False, 'shape': (4294967396, 2)}`
- NumPy uses `npy_intp = Py_ssize_t = int64` internally on 64-bit systems
- Element count: `numpy.multiply.reduce(shape, dtype=numpy.int64)`

**The int32 limitation is purely in NumSharp's implementation, not the file format.**

## Proposal

Complete rewrite with:

### 1. Full Long Indexing Support
- `long[]` shapes throughout
- `long` for element counts and byte offsets
- Parse shape with `long.Parse()`, not `Int32.Parse()`

### 2. Async/Await with CancellationToken
```csharp
ValueTask saveAsync(string path, NDArray arr, NpyOptions? options = null, CancellationToken ct = default);
ValueTask<NDArray> loadAsync(string path, NpyOptions? options = null, CancellationToken ct = default);
```

### 3. Configurable Options
```csharp
public sealed class NpyOptions
{
    public int ReadBufferSize { get; init; } = 256 * 1024;       // 256 KiB
    public int WriteBufferSize { get; init; } = 16 * 1024 * 1024; // 16 MiB
    public int MaxHeaderSize { get; init; } = 10_000;
    public bool UseAsyncIO { get; init; } = true;
    public long AsyncThreshold { get; init; } = 64 * 1024;
    public FileOptions FileOptions { get; init; } = FileOptions.Asynchronous | FileOptions.SequentialScan;
    public CompressionLevel CompressionLevel { get; init; } = CompressionLevel.Fastest;
    public IProgress<(long BytesProcessed, long TotalBytes)>? Progress { get; init; }
}
```

### 4. All Format Versions
- Version 1.0: 2-byte header length (original)
- Version 2.0: 4-byte header length (large structured dtypes)
- Version 3.0: UTF-8 encoding (unicode field names)

### 5. Performance Optimizations
- `ValueTask` for zero-allocation hot path
- `ArrayPool<byte>` for buffer reuse
- Adaptive sync/async based on size threshold
- Direct unsafe memory access for contiguous arrays

## File Structure

```
src/NumSharp.Core/IO/
├── NpyFormat.cs           # Core (partial class)
├── NpyFormat.Constants.cs # Magic bytes, versions, limits
├── NpyFormat.Header.cs    # Parse/write headers with long[] shapes
├── NpyFormat.Read.cs      # Async array reading
├── NpyFormat.Write.cs     # Async array writing
├── NpyOptions.cs          # Configuration
└── NpzArchive.cs          # .npz support
```

## API Changes

### New Async API
```csharp
// Primary async API
ValueTask np.saveAsync(string path, NDArray arr, NpyOptions? options, CancellationToken ct);
ValueTask<NDArray> np.loadAsync(string path, NpyOptions? options, CancellationToken ct);
ValueTask np.savezAsync(string path, Dictionary<string, NDArray> arrays, NpyOptions? options, CancellationToken ct);
ValueTask<NpzArchive> np.load_npzAsync(string path, NpyOptions? options, CancellationToken ct);

// Header inspection without loading data
ValueTask<NpyHeader> np.load_headerAsync(string path, NpyOptions? options, CancellationToken ct);
```

### Sync Wrappers (Backward Compatible)
```csharp
// Existing API still works
np.save(string path, NDArray arr);
NDArray np.load(string path);
```

## Tasks

- [ ] Create `NpyOptions` configuration class
- [ ] Implement `NpyFormat.Constants` (magic bytes, versions)
- [ ] Implement `NpyFormat.Header` (parse/write with `long[]` shapes)
- [ ] Implement `NpyFormat.Write` (async, chunked, progress)
- [ ] Implement `NpyFormat.Read` (async, chunked, progress)
- [ ] Implement `NpzArchive` (IAsyncDisposable)
- [ ] Update `np.save.cs` with async overloads
- [ ] Update `np.load.cs` with async overloads
- [ ] Unit tests: round-trip, cancellation, progress, large shapes, all dtypes, all versions
- [ ] Performance benchmarks vs current implementation
- [ ] Documentation update

## Breaking Changes

None - sync API signatures preserved. New async API is additive.

## Related

- Design document: `docs/NPY_FORMAT_DESIGN.md`
- Long indexing tracking: `docs/LONG_INDEXING_ISSUES.md` (M2)
- NumPy reference: `src/numpy/numpy/lib/_format_impl.py` (v2.4.2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] np.save and np.load refactor for long index support and 100% interoperable with numpy #589

Overview

Problem

Clarification: .npy Format Has NO int32 Limit

Proposal

1. Full Long Indexing Support

2. Async/Await with CancellationToken

3. Configurable Options

4. All Format Versions

5. Performance Optimizations

File Structure

API Changes

New Async API

Sync Wrappers (Backward Compatible)

Tasks

Breaking Changes

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Refactor] np.save and np.load refactor for long index support and 100% interoperable with numpy #589

Description

Overview

Problem

Clarification: .npy Format Has NO int32 Limit

Proposal

1. Full Long Indexing Support

2. Async/Await with CancellationToken

3. Configurable Options

4. All Format Versions

5. Performance Optimizations

File Structure

API Changes

New Async API

Sync Wrappers (Backward Compatible)

Tasks

Breaking Changes

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions