Skip to content

Feature Request: rpc: Load tensors in parallel throughout rpc-servers #16434

@nguha

Description

@nguha

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Make all rpc-servers load tensors to device (GPU) memory in parallel whenavailable in their respective local cache.

Motivation

As of b6140 tensors are loaded sequentially from disk to the GPUs memory. This is a simple and performant approach when all GPUs are connected to a single machine. The load performance is bound to the IO disk performance, parallelizing could even decrease performance.

The scenario changes if some of the GPUs are connected via rpc-servers.

If there's yet no cache in the rpc-server machine(s), meaning that all tensors must be transmitted through the network, the load is bound to the network bandwidth of the client machine (the one sending out the tensors). Parallelizing would again be useless or detrimental.

However, if tensors are already cached in the rpc-server machine(s) disk(s), each rpc-server could immediately start loading them into its GPUs. This would greatly decrease model load time in multi rpc-servers scenarios, as
they would be effectively loading tensors in parallel.

Possible Implementation

For each rpc-server in use, start immediately loading tensors if they are present in the server's local cache.
As we can now run just one rpc-server per physical machine (thanks to @rgerganov on PR #16276 ), there's no longer the risk of parallel access to the hard drive (which can be slower in some hardware).

PR #13106 seems related to this feature request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions