Skip to content

[Ecosystem] safetensors #61

@LysandreJik

Description

@LysandreJik

Contact emails

lysandre@huggingface.co

Project summary

Simple, safe way to store and distribute tensors

Project description

SafeTensors is a secure, fast file format for storing machine learning tensors.

It can be used as a replacement of torch.load and binds itself to underlying torch loading APIs; preventing arbitrary code execution while enabling zero-copy and lazy loading.

Files contain a JSON header with tensor metadata followed by raw data buffers. It speeds up distributed model loading significantly.

Key benefits:

  • no code execution risk
  • 100MB header limit for DOS protection
  • faster multi-GPU loading
  • cross-language compatibility via Rust and Python implementations

It is an Open-source under Apache 2.0, developed by Hugging Face.

Are there any other projects in the PyTorch Ecosystem similar to yours? If, yes, what are they?

No comparable library as far as I know

Project repo URL

https://github.com/huggingface/safetensors

Additional repos in scope of the application

No response

Project license

Apache 2.0

GitHub handles of the project maintainer(s)

danieldk,McPatate

Is there a corporate or academic entity backing this project? If so, please provide the name and URL of the entity.

Hugging Face

Website URL

huggingface.co

Documentation

huggingface.co/docs/safetensors

How do you build and test the project today (continuous integration)? Please describe.

on every commit, build rust, run rust tests, audit, code coverage, clippy, fmt on stable release for the following oses:

[ubuntu-latest, windows-latest, macOS-latest]

Performance regression check is run on every commit to main with pytest-benchmarking , we store the results of previous commits in GH actions/cache and add a comment on PR if an issue arises.

Documentation is built every commit to main.

For python, one CI to run tests, clippy, cargo audit of python bindings (py + rust code) on the following platforms:

          - os: ubuntu-latest
            version:
              torch: torch
              python: "3.13"
              numpy: numpy
              arch: "x64-freethreaded"
          - os: macos-15-intel
            version:
              torch: torch==1.10
              numpy: "numpy==1.26"
              python: "3.9"
              arch: "x64"
          - os: macos-latest
            version:
              torch: torch
              python: "3.12"
              numpy: numpy
              arch: "arm64"
          - os: windows-11-arm
            version:
              torch: torch
              python: "3.12"
              numpy: numpy
              arch: "arm64"and bonus separate platform:
  test_s390x_big_endian:
    runs-on: ubuntu-latest
    name: Test bigendian - S390X-> pytorch latest I assume for s390x

and lastly, release CI which builds on every platform - architecture permutation in the world, and when all is built pushes an artifact to pypi. release CI is run on every commit to main but only pushes to pypi when the commit has a tag associated with it.

ubuntu-latest, aarch64
ubuntu-latest, armv7
ubuntu-latest, ppc64le
ubuntu-latest, s390x
ubuntu-latest, x86
ubuntu-latest, x86_64
macos-14, aarch64
macos-15-intel, x86_64
ubuntu-latest, aarch64
ubuntu-latest, armv7
ubuntu-latest, x86
ubuntu-latest, x86_64
windows-11-arm, arm64
windows-latest, x64
windows-latest, x86

Version of PyTorch

1.10 at the moment

Components of PyTorch

Components of pytorch:

  • from_file
  • Tensor (.to , .narrow )
  • UntypedStorage , ByteStorage

How long do you expect to maintain the project?

As long as machine learning is relevant! We're pouring significant resources in the project and consider it still early,

Additional information

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Done

Status

Future Meeting Agenda Items

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions