1. Docker Setup

1.1 Install Docker

1.2 Install and register Habana container runtime:

Note: This assumes all other necessary Habana SW is already installed.

# install
sudo apt install -y habanalabs-container-runtime

# register
sudo tee /etc/docker/daemon.json <<EOF
{
   "runtimes": {
      "habana": {
            "path": "/usr/bin/habana-container-runtime",
            "runtimeArgs": []
      }
   }
}
EOF

Reference: https://docs.habana.ai/en/latest/Installation_Guide/Additional_Installation/Docker_Installation.html

2. Build vLLM HPU image

Follow up to step 3 in this README: https://github.com/HabanaAI/Gaudi-tutorials/tree/main/PyTorch/vLLM_Tutorials/Deploying_vLLM

3. Build vLLM CPU image

git clone https://github.com/vllm-project/vllm.git && cd vllm
git checkout v0.9.1
docker build  -f docker/Dockerfile.cpu -t vllm-cpu-perf --shm-size=4g .

Other Benchmarking BKMs: Benchmark Guide for CPU

4. RouteLLM

RouteLLM Public Repo: https://github.com/lm-sys/RouteLLM/tree/main

RouteLLM Blog: https://lmsys.org/blog/2024-07-01-routellm/

RouteLLM Internal Repo: https://github.com/intel-innersource/applications.ai.iaas.dse-iaas/tree/main

Clone the internal repo and navigate to the RouteLLM directory:

git clone https://github.com/intel-innersource/applications.ai.iaas.dse-iaas.git cd applications.ai.iaas.dse-iaas/routellm

The following has been tested on an IBM Cloud Gaudi3 system:

8xGaudi3
EMR host node - 40c per socket, 2 socket, 1.7TB mem
OS: Ubuntu 22.04.5 LTS

NOTE: Make sure to update the yaml file with your HuggingFace auth token as well as the hf_cache and recipe_cache paths.

docker compose -f endpoints_and_routing-compose.yaml up --build

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
bench_vllm.sh		bench_vllm.sh
endpoints_and_routing-compose.yaml		endpoints_and_routing-compose.yaml
multi_bench_vllm.sh		multi_bench_vllm.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. Docker Setup

1.1 Install Docker

1.2 Install and register Habana container runtime:

2. Build vLLM HPU image

3. Build vLLM CPU image

4. RouteLLM

About

Uh oh!

Releases

Packages

Languages

romirdes/SMC-Gaudi-Head-Node

Folders and files

Latest commit

History

Repository files navigation

1. Docker Setup

1.1 Install Docker

1.2 Install and register Habana container runtime:

2. Build vLLM HPU image

3. Build vLLM CPU image

4. RouteLLM

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages