Fix distributed loading when using paddle#19
Fix distributed loading when using paddle#19takeshi-yoshimura merged 1 commit intofoundation-model-stack:mainfrom
Conversation
|
Hello, I need to confirm the parallel loading means we use multi processes to load all tensors to one gpu? Or use multi processes to load all tensors to different gpu then broadcast them? |
|
@zeroRains
It depends on what you want to do. test cases use a single GPU due to their limited environment, but in realistic workloads processes should load files to each GPU and broadcast/scatter tensors. |
fix the distributed load for paddle remove useless file make sure the device id does not exceed the device count Signed-off-by: zeroRains <linjunlu@zerorains.top>
| d_id = device.split(":") # "gpu:0" or "gpu" | ||
| d_id = int(d_id[1]) if len(d_id) == 2 else 0 | ||
| if isinstance(self.pg, SingleGroup): | ||
| # For single (gpu:x, gpu) | ||
| # gpu:x, like gpu:0, gpu:1, ... | ||
| d_id = device.split(":") | ||
| d_id = int(d_id[1]) if len(d_id) == 2 else 0 | ||
| else: | ||
| # For distributed | ||
| # The gpu determines the current rank | ||
| # rank0 use gpu:0, rank1 use gpu:1 | ||
| d_id = self.pg.rank() % paddle.device.cuda.device_count() | ||
| self.device = f"gpu:{d_id}" |
There was a problem hiding this comment.
In this part, maybe, It dose not need to consider distributed case in fastsafetensors.
We just need to load the tensors to correct device which provided by user.
In a machine with multi gpus, user should set the device like that device="gpu:{pg.rank()}" in distributed code then send device to the SafeTensorsFileLoader so that different processes can load tensors to different gpus .
What do you think?
There was a problem hiding this comment.
I don't think so because safetensors files that are distributed online are not composed like that.
18391ca
into
foundation-model-stack:main
|
Thank you! |
Environment:
I modify the distributed loading command and write two .sh file
run_paddle_parallel_cpu.shandrun_paddle_parallel_gpu.sh. It is a standard distributed lauching command in paddle.I also add a unit test to test distibuted loading tensors with paddle.