How to use GPUs from an HPC System for LibreChat #4250
dirkpetersen
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
In our LibreChat Enterprise Pilot Project we use AWS Bedrock and are happy with that. However, we wanted to benchmark it against several locally hosted LLama 3.1 and you need about 6 A100 GPUs with 80GB each for the big 405B model ...... which we have only in our HPC cluster ..... and that only supports batch jobs. We came up with a way to serve llama-cpp-python and use Traefik as a load balancer with HA and packaged it all up in an easy to use process https://github.com/dirkpetersen/forever-slurm (happy to accept PRs for improvements) . There is also a background story . The 405B model is significantly slower than on Bedrock but the 70B model running on a single A40 GPU offers the same performance as Bedrock. To our great surprise this is actually very stable
Beta Was this translation helpful? Give feedback.
All reactions