Skip to content

Commit 636372e

Browse files
readme update
1 parent dc329fe commit 636372e

File tree

5 files changed

+9
-7
lines changed

5 files changed

+9
-7
lines changed

cloud-infrastructure/ai-infra-gpu/ai-infrastructure/dstack/README.md

Whitespace-only changes.

cloud-infrastructure/ai-infra-gpu/ai-infrastructure/litellm/README.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,19 @@
22

33
In this tutorial we explain how to use a LiteLLM Proxy Server to call multiple LLM inference endpoints from a single interface. LiteLLM interacts will 100+ LLMs such as OpenAI, Cohere, NVIDIA Triton and NIM, etc. Here we will use two vLLM inference servers.
44

5-
![Hybrid shards](assets/images/litellm.png "LiteLLM")
5+
<!-- ![Hybrid shards](assets/images/litellm.png "LiteLLM") -->
66

77
## Introduction
88

99
LiteLLM provides a proxy server to manage auth, loadbalancing, and spend tracking across 100+ LLMs. All in the OpenAI format.
1010
vLLM is a fast and easy-to-use library for LLM inference and serving.
11-
The first step will be to deploy two vLLM inference servers on NVIDIA A10 powered virtual machine instances. In the second step, we will create a LiteLLM Proxy Server on a third no-GPU instance and explain how we can use this interface to call the two LLM from a single location. For the sake of silplicity, all 3 instances will reside in the same public subnet here.
11+
The first step will be to deploy two vLLM inference servers on NVIDIA A10 powered virtual machine instances. In the second step, we will create a LiteLLM Proxy Server on a third no-GPU instance and explain how we can use this interface to call the two LLM from a single location. For the sake of simplicity, all 3 instances will reside in the same public subnet here.
12+
13+
![Hybrid shards](assets/images/litellm-architecture.png "LiteLLM")
1214

1315
## vLLM inference servers deployment
1416

15-
For each of the inference servers nodes a VM.GPU.A10.2 instance (2 x NVIDIA A10 GPU 24GB) is used in combination with the NVIDIA GPU-Optimized VMI image from the OCI marketplace. This Ubuntu-based image comes with all the necessary libraries (Docker, NVIDIA Container Toolkit) preinstalled.
17+
For each of the inference nodes a VM.GPU.A10.2 instance (2 x NVIDIA A10 GPU 24GB) is used in combination with the NVIDIA GPU-Optimized VMI image from the OCI marketplace. This Ubuntu-based image comes with all the necessary libraries (Docker, NVIDIA Container Toolkit) preinstalled.
1618
The vLLM inference server is deployed using the vLLM official container image.
1719
```
1820
docker run --gpus all \
@@ -41,7 +43,7 @@ curl http://localhost:8000/v1/chat/completions \
4143

4244
## LiteLLM server deployment
4345

44-
No GPU are required for LiteLLM. Therefore, a CPU based VM.Standard.E4.flex instance (4 OCPUs, 64 GB Memory) with a standard Ubuntu 22.04 image is used. Here LiteLLM is used as a proxy server calling a vLLM endpoint. Install LiteLLM using `pip`:
46+
No GPU are required for LiteLLM. Therefore, a CPU based VM.Standard.E4.Flex instance (4 OCPUs, 64 GB Memory) with a standard Ubuntu 22.04 image is used. Here LiteLLM is used as a proxy server calling a vLLM endpoint. Install LiteLLM using `pip`:
4547
```
4648
pip install 'litellm[proxy]'
4749
```
@@ -51,15 +53,15 @@ model_list:
5153
- model_name: Mistral-7B-Instruct
5254
litellm_params:
5355
model: openai/mistralai/Mistral-7B-Instruct-v0.3
54-
api_base: http://public_ip_1:8000/v1
56+
api_base: http://xxx.xxx.xxx.xxx:8000/v1
5557
api_key: sk-0123456789
5658
- model_name: Mistral-7B-Instruct
5759
litellm_params:
5860
model: openai/mistralai/Mistral-7B-Instruct-v0.3
59-
api_base: http://public_ip_2:8000/v1
61+
api_base: http://xxx.xxx.xxx.xxx:8000/v1
6062
api_key: sk-0123456789
6163
```
62-
where `sk-0123456789` is a valid OpenAI API key and `public_ip_1` and `public_ip_2` are the two GPU instances public IP addresses.
64+
where `sk-0123456789` is a valid OpenAI API key and `xxx.xxx.xxx.xxx` are the two GPU instances public IP addresses.
6365

6466
Start the LiteLLM Proxy Server with the following command:
6567
```
24.6 KB
Loading

0 commit comments

Comments
 (0)