Skip to content

[Refactor] np.save and np.load refactor for long index support and 100% interoperable with numpy #589

@Nucs

Description

@Nucs

Overview

Refactor np.save and np.load to support long indexing (shapes > int.MaxValue) and achieve 100% interoperability with NumPy's .npy/.npz file format.

Problem

The current implementation has several limitations:

  1. int32 shape limitation: Uses int[] for shapes throughout, incorrectly assuming this is a file format constraint
  2. No async support: Large array I/O blocks the calling thread
  3. No cancellation support: Long operations cannot be cancelled
  4. Fixed buffer sizes: No tuning for different workloads
  5. Version 1.0 only: Doesn't support format versions 2.0/3.0
// Current: forces int32 parsing
shape = header.Split(',').Select(Int32.Parse).ToArray();

// Current: no async, no cancellation
public static NDArray load(string path) { ... }

Clarification: .npy Format Has NO int32 Limit

Investigation of NumPy's numpy/lib/_format_impl.py (v2.4.2) reveals:

  • Shape values are Python integer literals in ASCII text (arbitrary precision)
  • Header example: {'descr': '<f8', 'fortran_order': False, 'shape': (4294967396, 2)}
  • NumPy uses npy_intp = Py_ssize_t = int64 internally on 64-bit systems
  • Element count: numpy.multiply.reduce(shape, dtype=numpy.int64)

The int32 limitation is purely in NumSharp's implementation, not the file format.

Proposal

Complete rewrite with:

1. Full Long Indexing Support

  • long[] shapes throughout
  • long for element counts and byte offsets
  • Parse shape with long.Parse(), not Int32.Parse()

2. Async/Await with CancellationToken

ValueTask saveAsync(string path, NDArray arr, NpyOptions? options = null, CancellationToken ct = default);
ValueTask<NDArray> loadAsync(string path, NpyOptions? options = null, CancellationToken ct = default);

3. Configurable Options

public sealed class NpyOptions
{
    public int ReadBufferSize { get; init; } = 256 * 1024;       // 256 KiB
    public int WriteBufferSize { get; init; } = 16 * 1024 * 1024; // 16 MiB
    public int MaxHeaderSize { get; init; } = 10_000;
    public bool UseAsyncIO { get; init; } = true;
    public long AsyncThreshold { get; init; } = 64 * 1024;
    public FileOptions FileOptions { get; init; } = FileOptions.Asynchronous | FileOptions.SequentialScan;
    public CompressionLevel CompressionLevel { get; init; } = CompressionLevel.Fastest;
    public IProgress<(long BytesProcessed, long TotalBytes)>? Progress { get; init; }
}

4. All Format Versions

  • Version 1.0: 2-byte header length (original)
  • Version 2.0: 4-byte header length (large structured dtypes)
  • Version 3.0: UTF-8 encoding (unicode field names)

5. Performance Optimizations

  • ValueTask for zero-allocation hot path
  • ArrayPool<byte> for buffer reuse
  • Adaptive sync/async based on size threshold
  • Direct unsafe memory access for contiguous arrays

File Structure

src/NumSharp.Core/IO/
├── NpyFormat.cs           # Core (partial class)
├── NpyFormat.Constants.cs # Magic bytes, versions, limits
├── NpyFormat.Header.cs    # Parse/write headers with long[] shapes
├── NpyFormat.Read.cs      # Async array reading
├── NpyFormat.Write.cs     # Async array writing
├── NpyOptions.cs          # Configuration
└── NpzArchive.cs          # .npz support

API Changes

New Async API

// Primary async API
ValueTask np.saveAsync(string path, NDArray arr, NpyOptions? options, CancellationToken ct);
ValueTask<NDArray> np.loadAsync(string path, NpyOptions? options, CancellationToken ct);
ValueTask np.savezAsync(string path, Dictionary<string, NDArray> arrays, NpyOptions? options, CancellationToken ct);
ValueTask<NpzArchive> np.load_npzAsync(string path, NpyOptions? options, CancellationToken ct);

// Header inspection without loading data
ValueTask<NpyHeader> np.load_headerAsync(string path, NpyOptions? options, CancellationToken ct);

Sync Wrappers (Backward Compatible)

// Existing API still works
np.save(string path, NDArray arr);
NDArray np.load(string path);

Tasks

  • Create NpyOptions configuration class
  • Implement NpyFormat.Constants (magic bytes, versions)
  • Implement NpyFormat.Header (parse/write with long[] shapes)
  • Implement NpyFormat.Write (async, chunked, progress)
  • Implement NpyFormat.Read (async, chunked, progress)
  • Implement NpzArchive (IAsyncDisposable)
  • Update np.save.cs with async overloads
  • Update np.load.cs with async overloads
  • Unit tests: round-trip, cancellation, progress, large shapes, all dtypes, all versions
  • Performance benchmarks vs current implementation
  • Documentation update

Breaking Changes

None - sync API signatures preserved. New async API is additive.

Related

  • Design document: docs/NPY_FORMAT_DESIGN.md
  • Long indexing tracking: docs/LONG_INDEXING_ISSUES.md (M2)
  • NumPy reference: src/numpy/numpy/lib/_format_impl.py (v2.4.2)

Metadata

Metadata

Assignees

Labels

NumPy 2.x ComplianceAligns behavior with NumPy 2.x (NEPs, breaking changes)apiPublic API surface (np.*, NDArray methods, operators)performancePerformance improvements or optimizations

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions