Fix distributed loading when using paddle by zeroRains · Pull Request #19 · foundation-model-stack/fastsafetensors

zeroRains · 2025-06-05T05:48:48Z

Environment：

Device: 2 * Tesla V100
Cuda: 11.8
paddlepaddle-gpu == 3.0.0
pytorch == 2.5.1

I modify the distributed loading command and write two .sh file run_paddle_parallel_cpu.sh and run_paddle_parallel_gpu.sh. It is a standard distributed lauching command in paddle.

I also add a unit test to test distibuted loading tensors with paddle.

zeroRains · 2025-06-05T06:34:37Z

Hello, I need to confirm the parallel loading means we use multi processes to load all tensors to one gpu? Or use multi processes to load all tensors to different gpu then broadcast them?

takeshi-yoshimura · 2025-06-05T07:05:29Z

@zeroRains
thank you for your contribution again! let me confirm your change later.

I need to confirm the parallel loading means we use multi processes to load all tensors to one gpu? Or use multi processes to load all tensors to different gpu then broadcast them?

It depends on what you want to do. test cases use a single GPU due to their limited environment, but in realistic workloads processes should load files to each GPU and broadcast/scatter tensors.

fix the distributed load for paddle remove useless file make sure the device id does not exceed the device count Signed-off-by: zeroRains <linjunlu@zerorains.top>

zeroRains · 2025-06-05T13:49:21Z

fastsafetensors/loader.py

-                    d_id = device.split(":") # "gpu:0" or "gpu"
-                    d_id = int(d_id[1]) if len(d_id) == 2 else 0
+                    if isinstance(self.pg, SingleGroup):
+                        # For single (gpu:x, gpu)
+                        # gpu:x, like gpu:0, gpu:1, ...
+                        d_id = device.split(":")
+                        d_id = int(d_id[1]) if len(d_id) == 2 else 0
+                    else:                    
+                        # For distributed
+                        # The gpu determines the current rank
+                        # rank0 use gpu:0, rank1 use gpu:1
+                        d_id = self.pg.rank() % paddle.device.cuda.device_count()
+                        self.device = f"gpu:{d_id}"


In this part, maybe, It dose not need to consider distributed case in fastsafetensors.

We just need to load the tensors to correct device which provided by user.

In a machine with multi gpus, user should set the device like that device="gpu:{pg.rank()}" in distributed code then send device to the SafeTensorsFileLoader so that different processes can load tensors to different gpus .

What do you think?

I don't think so because safetensors files that are distributed online are not composed like that.

takeshi-yoshimura · 2025-06-06T02:13:41Z

Thank you!
let me refactor all the code to resolve lint errors...

zeroRains force-pushed the dist branch from c3d845a to 867940f Compare June 5, 2025 05:55

zeroRains force-pushed the dist branch from 867940f to d3482e4 Compare June 5, 2025 11:52

fix the distributed load for paddle

5ab0d6d

fix the distributed load for paddle remove useless file make sure the device id does not exceed the device count Signed-off-by: zeroRains <linjunlu@zerorains.top>

zeroRains force-pushed the dist branch from d3482e4 to 5ab0d6d Compare June 5, 2025 11:54

zeroRains commented Jun 5, 2025

View reviewed changes

takeshi-yoshimura merged commit 18391ca into foundation-model-stack:main Jun 6, 2025
11 of 13 checks passed

zeroRains deleted the dist branch June 6, 2025 02:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix distributed loading when using paddle#19

Fix distributed loading when using paddle#19
takeshi-yoshimura merged 1 commit intofoundation-model-stack:mainfrom
zeroRains:dist

zeroRains commented Jun 5, 2025

Uh oh!

zeroRains commented Jun 5, 2025

Uh oh!

takeshi-yoshimura commented Jun 5, 2025

Uh oh!

zeroRains Jun 5, 2025 •

edited

Loading

Uh oh!

takeshi-yoshimura Jun 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

takeshi-yoshimura commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zeroRains commented Jun 5, 2025

Uh oh!

zeroRains commented Jun 5, 2025

Uh oh!

takeshi-yoshimura commented Jun 5, 2025

Uh oh!

zeroRains Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

takeshi-yoshimura Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

takeshi-yoshimura commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zeroRains Jun 5, 2025 •

edited

Loading

takeshi-yoshimura Jun 6, 2025 •

edited

Loading