Skip to content

Commit 128c053

Browse files
Merge branch 'main' into ppaolucc-brahcn-SQL-tools-II
2 parents 71b66a2 + 79f17d0 commit 128c053

File tree

209 files changed

+4076
-3007
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

209 files changed

+4076
-3007
lines changed
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# Calling multiple vLLM inference servers using LiteLLM
2+
3+
In this tutorial we explain how to use a LiteLLM Proxy Server to call multiple LLM inference endpoints from a single interface. LiteLLM interacts will 100+ LLMs such as OpenAI, Cohere, NVIDIA Triton and NIM, etc. Here we will use two vLLM inference servers.
4+
5+
<!-- ![Hybrid shards](assets/images/litellm.png "LiteLLM") -->
6+
7+
# When to use this asset?
8+
9+
To run the inference tutorial with local deployments of Mistral 7B Instruct v0.3 using a vLLM inference server powered by an NVIDIA A10 GPU and a LiteLLM Proxy Server on top.
10+
11+
# How to use this asset?
12+
13+
These are the prerequisites to run this tutorial:
14+
* An OCI tenancy with A10 quota
15+
* A Huggingface account with a valid Auth Token
16+
* A valid OpenAI API Key
17+
18+
## Introduction
19+
20+
LiteLLM provides a proxy server to manage auth, loadbalancing, and spend tracking across 100+ LLMs. All in the OpenAI format.
21+
vLLM is a fast and easy-to-use library for LLM inference and serving.
22+
The first step will be to deploy two vLLM inference servers on NVIDIA A10 powered virtual machine instances. In the second step, we will create a LiteLLM Proxy Server on a third no-GPU instance and explain how we can use this interface to call the two LLM from a single location. For the sake of simplicity, all 3 instances will reside in the same public subnet here.
23+
24+
![Hybrid shards](assets/images/litellm-architecture.png "LiteLLM")
25+
26+
## vLLM inference servers deployment
27+
28+
For each of the inference nodes a VM.GPU.A10.2 instance (2 x NVIDIA A10 GPU 24GB) is used in combination with the NVIDIA GPU-Optimized VMI image from the OCI marketplace. This Ubuntu-based image comes with all the necessary libraries (Docker, NVIDIA Container Toolkit) preinstalled. It is a good practice to deploy two instances in two different fault domains to ensure a higher availability.
29+
30+
The vLLM inference server is deployed using the vLLM official container image.
31+
```
32+
docker run --gpus all \
33+
-e HF_TOKEN=$HF_TOKEN -p 8000:8000 \
34+
--ipc=host \
35+
vllm/vllm-openai:latest \
36+
--host 0.0.0.0 \
37+
--port 8000 \
38+
--model mistralai/Mistral-7B-Instruct-v0.3 \
39+
--tensor-parallel-size 2 \
40+
--load-format safetensors \
41+
--trust-remote-code \
42+
--enforce-eager
43+
```
44+
where `$HF_TOKEN` is a valid HuggingFace token. In this case we use the 7B Instruct version of Mistral LLM. The vLLM endpoint can be directly called for verification with:
45+
```
46+
curl http://localhost:8000/v1/chat/completions \
47+
-H "Content-Type: application/json" \
48+
-d '{
49+
"model": "mistralai/Mistral-7B-Instruct-v0.3",
50+
"messages": [
51+
{"role": "user", "content": "Who won the world series in 2020?"}
52+
]
53+
}' | jq
54+
```
55+
56+
## LiteLLM server deployment
57+
58+
No GPU are required for LiteLLM. Therefore, a CPU based VM.Standard.E4.Flex instance (4 OCPUs, 64 GB Memory) with a standard Ubuntu 22.04 image is used. Here LiteLLM is used as a proxy server calling a vLLM endpoint. Install LiteLLM using `pip`:
59+
```
60+
pip install 'litellm[proxy]'
61+
```
62+
Edit the `config.yaml` file (OpenAI-Compatible Endpoint):
63+
```
64+
model_list:
65+
- model_name: Mistral-7B-Instruct
66+
litellm_params:
67+
model: openai/mistralai/Mistral-7B-Instruct-v0.3
68+
api_base: http://xxx.xxx.xxx.xxx:8000/v1
69+
api_key: sk-0123456789
70+
- model_name: Mistral-7B-Instruct
71+
litellm_params:
72+
model: openai/mistralai/Mistral-7B-Instruct-v0.3
73+
api_base: http://xxx.xxx.xxx.xxx:8000/v1
74+
api_key: sk-0123456789
75+
```
76+
where `sk-0123456789` is a valid OpenAI API key and `xxx.xxx.xxx.xxx` are the two GPU instances public IP addresses.
77+
78+
Start the LiteLLM Proxy Server with the following command:
79+
```
80+
litellm --config /path/to/config.yaml
81+
```
82+
Once the the Proxy Server is ready call the vLLM endpoint through LiteLLM with:
83+
```
84+
curl http://localhost:4000/chat/completions \
85+
-H 'Authorization: Bearer sk-0123456789' \
86+
-H "Content-Type: application/json" \
87+
-d '{
88+
"model": "Mistral-7B-Instruct",
89+
"messages": [
90+
{"role": "user", "content": "Who won the world series in 2020?"}
91+
]
92+
}' | jq
93+
```
94+
95+
## Documentation
96+
97+
* [LiteLLM documentation](https://litellm.vercel.app/docs/providers/openai_compatible)
98+
* [vLLM documentation](https://docs.vllm.ai/en/latest/serving/deploying_with_docker.html)
99+
* [MistralAI](https://mistral.ai/)
24.6 KB
Loading
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
model_list:
2+
- model_name: Mistral-7B-Instruct
3+
litellm_params:
4+
model: openai/mistralai/Mistral-7B-Instruct-v0.3
5+
api_base: http://public_ip_1:8000/v1
6+
api_key: sk-0123456789
7+
- model_name: Mistral-7B-Instruct
8+
litellm_params:
9+
model: openai/mistralai/Mistral-7B-Instruct-v0.3
10+
api_base: http://public_ip_2:8000/v1
11+
api_key: sk-0123456789

cloud-infrastructure/ai-infra-gpu/ai-infrastructure/rag-langchain-vllm-mistral/files/requirements.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
aiohttp==3.9.5
1+
aiohttp==3.10.2
22
aiosignal==1.3.1
33
annotated-types==0.6.0
44
anyio==4.3.0
@@ -48,7 +48,7 @@ jsonpointer==2.4
4848
jsonschema==4.21.1
4949
jsonschema-specifications==2023.12.1
5050
langchain==0.1.16
51-
langchain-community==0.2.5
51+
langchain-community==0.2.9
5252
langchain-core==0.1.46
5353
langchain-text-splitters==0.0.1
5454
langsmith==0.1.51
@@ -57,7 +57,7 @@ llama-hub==0.0.79.post1
5757
llama-index==0.10.32
5858
llama-index-agent-openai==0.2.3
5959
llama-index-cli==0.1.12
60-
llama-index-core==0.10.32
60+
llama-index-core==0.10.38
6161
llama-index-embeddings-langchain==0.1.2
6262
llama-index-embeddings-openai==0.1.9
6363
llama-index-indices-managed-llama-cloud==0.1.5
@@ -87,7 +87,7 @@ nest-asyncio==1.6.0
8787
networkx==3.3
8888
newspaper3k==0.2.8
8989
ninja==1.11.1.1
90-
nltk==3.8.1
90+
nltk==3.9
9191
numba==0.59.1
9292
numpy==1.26.4
9393
nvidia-cublas-cu12==12.1.3.1

cloud-infrastructure/compute-including-hpc/compute-software/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ This page contains information and useful links regarding Compute services that
2727
- [Script to install and mount OCI bucket as Filesystem using Fuse S3FS](https://github.com/Olygo/OCI_S3FS)
2828
- [Mount a boot volume from one compute instance (or VM) onto another compute instance in order to replace lost ssh keys](https://gitlab.com/ms76152/system-administration)
2929
- [Transfer data to and from Oracle Cloud Infrastructure using OS tools such as sftp, scp, oci cli, curl](https://github.com/mariusscholtz/Oracle-Cloud-Infrastructure-resources/blob/main/VM-shapes/data%20transfer%20to%20OCI%20v1.0.pdf)
30+
- [Quering Compute Capacity using CloudShell](https://github.com/Olygo/OCI_ComputeCapacityReport)
3031

3132
# Useful Links
3233

cloud-infrastructure/multicloud/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Multicloud architectures leverage the coordinated use of cloud services from two or more public cloud vendors. Organizations use multicloud environments to distribute computing resources and minimize the risk of downtime and data loss. Organizations may also adopt two or more public cloud providers for their unique capabilities.
44

5-
Reviewed: 13.05.2024
5+
Reviewed: 20.09.2024
66

77
# Team Publications
88

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Private Cloud and Edge
2+
3+
## Useful Links
4+
5+
- [Oracle Compute Cloud@Customer](https://www.oracle.com/uk/cloud/compute/cloud-at-customer/)
6+
- [Roving Edge Infrastructure](https://www.oracle.com/uk/cloud/roving-edge-infrastructure/)
7+
8+
## License
9+
10+
Copyright (c) 2024 Oracle and/or its affiliates.
11+
12+
Licensed under the Universal Permissive License (UPL), Version 1.0.
13+
14+
See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details.
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# C3 Hosting Service Provider - IAM Policies for Isolation
2+
3+
The Hosting Service Provider (HSP) model on Compute Cloud@Customer (C3) allows
4+
hosting for multiple end customers, each isolated in a dedicated compartment
5+
with separate VCN(s) per customer. To ensure the end customer can only
6+
create resources in just their own compartment, a set of IAM policies are
7+
required.
8+
9+
The HSP documentation suggests the following policies per end customer
10+
based on an example with two hosting customers, A & B. They assume that
11+
each end customer will have two roles for their
12+
staff: Customer Administrator and Customer End User. 
13+
14+
## Example Policies for Customer Administrator
15+
```
16+
Allows the group specified to use all C3 services in the compartment
17+
listed:
18+
19+
Allow group CustA-Admin-grp to manage all-resources in compartment
20+
path:to:CustA
21+
22+
Allow group CustB-Admin-grp to manage all-resources in compartment
23+
path:to:CustB
24+
```
25+
Note that the above policy grants permissions in the CustA and CustB
26+
compartments of the C3 but **also in the same compartment in the OCI
27+
tenancy**! To prevent permissions being granted in the OCI tenancy
28+
append a condition such as:
29+
30+
```Allow group CustA-Admin-grp to manage all-resources in compartment
31+
path:to:CustA where all {request.region != 'LHR',request.region !=
32+
'FRA'}
33+
34+
Allow group CustB-Admin-grp to manage all-resources in compartment
35+
path:to:CustB where all {request.region != 'LHR',request.region !=
36+
'FRA'}
37+
```
38+
In the example above the condition prevents resource creation in London
39+
and Frankfurt regions. Adjust the list to include all regions the
40+
tenancy is subscribed to.
41+
42+
The path to the end user compartment must be explicitly stated, using
43+
the comma format, relative to the compartment where the policy is
44+
created. 
45+
46+
## Example Policies for Customer End User
47+
```
48+
Allow group CustA-Users-grp to manage instance-family in compartment
49+
path:to:CustA
50+
Allow group CustA-Users-grp to use volume-family in compartment
51+
path:to:CustA
52+
Allow group CustA-Users-grp to use virtual-network-family in compartment
53+
path:to:CustA
54+
Allow group CustB-Users-grp to manage instance-family in compartment
55+
path:to:CustB
56+
Allow group CustB-Users-grp to use volume-family in compartment
57+
path:to:CustB
58+
Allow group CustB-Users-grp to use virtual-network-family in compartment
59+
path:to:CustB
60+
```
61+
As above append a condition to limit permissions to the C3 and prevent
62+
resource creation in OCI regions:
63+
```
64+
Allow group CustA-Users-grp to manage instance-family in compartment
65+
path:to:CustA where all {request.region != 'LHR',request.region !=
66+
'FRA'}
67+
Allow group CustA-Users-grp to use volume-family in compartment
68+
path:to:CustA where all {request.region != 'LHR',request.region !=
69+
'FRA'}
70+
Allow group CustA-Users-grp to use virtual-network-family in compartment
71+
path:to:CustA where all {request.region != 'LHR',request.region !=
72+
'FRA'}
73+
Allow group CustB-Users-grp to manage instance-family in compartment
74+
path:to:CustB where all {request.region != 'LHR',request.region !=
75+
'FRA'}
76+
Allow group CustB-Users-grp to use volume-family in compartment
77+
path:to:CustB where all {request.region != 'LHR',request.region !=
78+
'FRA'}
79+
Allow group CustB-Users-grp to use virtual-network-family in compartment
80+
path:to:CustB where all {request.region != 'LHR',request.region !=
81+
'FRA'}
82+
```
83+
## Common Policy
84+
85+
Currently any user of a C3 needs access to certain resources located at
86+
the tenancy level to use IaaS resources in the web UI.
87+
Backup policies, tag namespaces, platform images, all reside at the
88+
tenancy level and need a further policy to allow normal use of C3 IaaS
89+
services. Note that this is a subtle difference to the behaviour on OCI. 
90+
91+
An extra policy as below is required (where CommonGroup contains **all**
92+
HSP users on the C3):
93+
```
94+
allow group CommonGroup to read all-resources in tenancy where
95+
target.compartment.name='root-compartment-name'
96+
```
97+

0 commit comments

Comments
 (0)