Exploring fastsafetensors for Network Storage: Seeking Community Input on Custom Reader Integration

Hi maintainers,

The two-phase abstraction in fastsafetensors is excellent:

1. **Parallel model loading** with multiple workers
2. **Efficient tensor distribution** via NCCL broadcast, leveraging high-speed GPU memory and NVLink bandwidth

While this design excels in GDS scenarios, we believe it's equally well-suited for network storage workloads that benefit from:
1. **High-concurrency file loading** (different files in parallel)
2. **Sequential bulk data reads**

Building on fastsafetensors' architecture (referenced in issue #29), we've implemented a **zero-copy reader optimized for network storage** and achieved promising results. Since zero-copy implementations vary across storage systems, we extended this approach to **3FS (an open-source distributed filesystem)**, creating a **usrbio-based reader**.

**Performance highlights:**
- **Peak throughput:** 35 GB/s (saturating single RDMA 400 Gbps link)
- **Setup:** 8 processes via usrbio SDK
- **Real-world result:** Loading 640GB DeepSeek-R1 in ~27 seconds

We're excited about these results and would love to open-source our 3FS reader. However, we'd like to check whether fastsafetensors has plans to support such custom readers, or if you'd be open to this type of contribution?

Our initial implementation consists of three main components, though we're happy to refine the architecture based on your feedback if there's interest in moving forward.

Looking forward to your thoughts!

<img width="1152" height="1398" alt="Image" src="https://github.com/user-attachments/assets/d7791519-4c04-408c-94de-104037c78ff8" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploring fastsafetensors for Network Storage: Seeking Community Input on Custom Reader Integration #55

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Exploring fastsafetensors for Network Storage: Seeking Community Input on Custom Reader Integration #55

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions