How to use GPUs from an HPC System for LibreChat #4250

dirkpetersen · 2024-09-25T23:49:33Z

dirkpetersen
Sep 25, 2024

In our LibreChat Enterprise Pilot Project we use AWS Bedrock and are happy with that. However, we wanted to benchmark it against several locally hosted LLama 3.1 and you need about 6 A100 GPUs with 80GB each for the big 405B model ...... which we have only in our HPC cluster ..... and that only supports batch jobs. We came up with a way to serve llama-cpp-python and use Traefik as a load balancer with HA and packaged it all up in an easy to use process https://github.com/dirkpetersen/forever-slurm (happy to accept PRs for improvements) . There is also a background story . The 405B model is significantly slower than on Bedrock but the 70B model running on a single A40 GPU offers the same performance as Bedrock. To our great surprise this is actually very stable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to use GPUs from an HPC System for LibreChat #4250

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

How to use GPUs from an HPC System for LibreChat #4250

Uh oh!

Uh oh!

dirkpetersen Sep 25, 2024

Replies: 0 comments

dirkpetersen
Sep 25, 2024