oracle-devrel
diff --git a/‎cloud-infrastructure/ai-infra-gpu/ai-infrastructure/litellm/README.md‎
Lines changed: 99 additions & 0 deletions b/‎cloud-infrastructure/ai-infra-gpu/ai-infrastructure/litellm/README.md‎
Lines changed: 99 additions & 0 deletions
diff --git a/‎cloud-infrastructure/ai-infra-gpu/ai-infrastructure/litellm/assets/images/litellm-architecture.png‎
24.6 KB b/‎cloud-infrastructure/ai-infra-gpu/ai-infrastructure/litellm/assets/images/litellm-architecture.png‎
24.6 KB
diff --git a/‎cloud-infrastructure/ai-infra-gpu/ai-infrastructure/litellm/config.yaml‎
Lines changed: 11 additions & 0 deletions b/‎cloud-infrastructure/ai-infra-gpu/ai-infrastructure/litellm/config.yaml‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎cloud-infrastructure/private-cloud-and-edge/README.md‎
Lines changed: 14 additions & 0 deletions b/‎cloud-infrastructure/private-cloud-and-edge/README.md‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎cloud-infrastructure/private-cloud-and-edge/hsp/README.md‎
Lines changed: 97 additions & 0 deletions b/‎cloud-infrastructure/private-cloud-and-edge/hsp/README.md‎
Lines changed: 97 additions & 0 deletions
@@ -0,0 +1,99 @@
+# Calling multiple vLLM inference servers using LiteLLM
+
+In this tutorial we explain how to use a LiteLLM Proxy Server to call multiple LLM inference endpoints from a single interface. LiteLLM interacts will 100+ LLMs such as OpenAI, Cohere, NVIDIA Triton and NIM, etc. Here we will use two vLLM inference servers.
+
+<!-- ![Hybrid shards](assets/images/litellm.png "LiteLLM") -->
+
+# When to use this asset?
+
+To run the inference tutorial with local deployments of Mistral 7B Instruct v0.3 using a vLLM inference server powered by an NVIDIA A10 GPU and a LiteLLM Proxy Server on top. 
+
+# How to use this asset?
+
+These are the prerequisites to run this tutorial:
+* An OCI tenancy with A10 quota
+* A Huggingface account with a valid Auth Token
+* A valid OpenAI API Key
+
+## Introduction
+
+LiteLLM provides a proxy server to manage auth, loadbalancing, and spend tracking across 100+ LLMs. All in the OpenAI format.
+vLLM is a fast and easy-to-use library for LLM inference and serving.
+The first step will be to deploy two vLLM inference servers on NVIDIA A10 powered virtual machine instances. In the second step, we will create a LiteLLM Proxy Server on a third no-GPU instance and explain how we can use this interface to call the two LLM from a single location. For the sake of simplicity, all 3 instances will reside in the same public subnet here.
+
+![Hybrid shards](assets/images/litellm-architecture.png "LiteLLM")
+
+## vLLM inference servers deployment
+
+For each of the inference nodes a VM.GPU.A10.2 instance (2 x NVIDIA A10 GPU 24GB) is used in combination with the NVIDIA GPU-Optimized VMI image from the OCI marketplace. This Ubuntu-based image comes with all the necessary libraries (Docker, NVIDIA Container Toolkit) preinstalled. It is a good practice to deploy two instances in two different fault domains to ensure a higher availability.
+
+The vLLM inference server is deployed using the vLLM official container image.
+```
+docker run --gpus all \
+    -e HF_TOKEN=$HF_TOKEN -p 8000:8000 \
+    --ipc=host \
+    vllm/vllm-openai:latest \
+    --host 0.0.0.0 \
+    --port 8000 \
+    --model mistralai/Mistral-7B-Instruct-v0.3 \
+    --tensor-parallel-size 2 \
+    --load-format safetensors \
+    --trust-remote-code \
+    --enforce-eager
+```
+where `$HF_TOKEN` is a valid HuggingFace token. In this case we use the 7B Instruct version of Mistral LLM. The vLLM endpoint can be directly called for verification with:
+```
+curl http://localhost:8000/v1/chat/completions \
+    -H "Content-Type: application/json" \
+    -d '{
+        "model": "mistralai/Mistral-7B-Instruct-v0.3",
+        "messages": [
+            {"role": "user", "content": "Who won the world series in 2020?"}
+        ]
+    }' | jq
+```
+
+## LiteLLM server deployment
+
+No GPU are required for LiteLLM. Therefore, a CPU based VM.Standard.E4.Flex instance (4 OCPUs, 64 GB Memory) with a standard Ubuntu 22.04 image is used. Here LiteLLM is used as a proxy server calling a vLLM endpoint. Install LiteLLM using `pip`:
+```
+pip install 'litellm[proxy]'
+```
+Edit the `config.yaml` file (OpenAI-Compatible Endpoint):
+```
+model_list:
+  - model_name: Mistral-7B-Instruct
+    litellm_params:
+      model: openai/mistralai/Mistral-7B-Instruct-v0.3
+      api_base: http://xxx.xxx.xxx.xxx:8000/v1
+      api_key: sk-0123456789
+  - model_name: Mistral-7B-Instruct
+    litellm_params:
+      model: openai/mistralai/Mistral-7B-Instruct-v0.3
+      api_base: http://xxx.xxx.xxx.xxx:8000/v1
+      api_key: sk-0123456789
+```
+where `sk-0123456789` is a valid OpenAI API key and `xxx.xxx.xxx.xxx` are the two GPU instances public IP addresses.
+
+Start the LiteLLM Proxy Server with the following command:
+```
+litellm --config /path/to/config.yaml
+```
+Once the the Proxy Server is ready call the vLLM endpoint through LiteLLM with:
+```
+curl http://localhost:4000/chat/completions \
+    -H 'Authorization: Bearer sk-0123456789' \
+    -H "Content-Type: application/json" \
+    -d '{
+        "model": "Mistral-7B-Instruct",
+        "messages": [
+            {"role": "user", "content": "Who won the world series in 2020?"}
+        ]
+    }' | jq
+```
+
+## Documentation
+
+* [LiteLLM documentation](https://litellm.vercel.app/docs/providers/openai_compatible)
+* [vLLM documentation](https://docs.vllm.ai/en/latest/serving/deploying_with_docker.html)
+* [MistralAI](https://mistral.ai/)
@@ -0,0 +1,11 @@
+model_list:
+  - model_name: Mistral-7B-Instruct
+    litellm_params:
+      model: openai/mistralai/Mistral-7B-Instruct-v0.3
+      api_base: http://public_ip_1:8000/v1
+      api_key: sk-0123456789
+  - model_name: Mistral-7B-Instruct
+    litellm_params:
+      model: openai/mistralai/Mistral-7B-Instruct-v0.3
+      api_base: http://public_ip_2:8000/v1
+      api_key: sk-0123456789
@@ -0,0 +1,14 @@
+# Private Cloud and Edge
+
+## Useful Links
+
+- [Oracle Compute Cloud@Customer](https://www.oracle.com/uk/cloud/compute/cloud-at-customer/)
+- [Roving Edge Infrastructure](https://www.oracle.com/uk/cloud/roving-edge-infrastructure/)
+
+## License
+
+Copyright (c) 2024 Oracle and/or its affiliates.
+
+Licensed under the Universal Permissive License (UPL), Version 1.0.
+
+See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details.
@@ -0,0 +1,97 @@
+# C3 Hosting Service Provider - IAM Policies for Isolation
+
+The Hosting Service Provider (HSP) model on Compute Cloud@Customer (C3) allows
+hosting for multiple end customers, each isolated in a dedicated compartment
+with separate VCN(s) per customer. To ensure the end customer can only
+create resources in just their own compartment, a set of IAM policies are
+required.
+
+The HSP documentation suggests the following policies per end customer
+based on an example with two hosting customers, A & B. They assume that
+each end customer will have two roles for their
+staff: Customer Administrator and Customer End User. 
+
+## Example Policies for Customer Administrator
+```
+Allows the group specified to use all C3 services in the compartment
+listed:
+
+Allow group CustA-Admin-grp to manage all-resources in compartment
+path:to:CustA
+
+Allow group CustB-Admin-grp to manage all-resources in compartment
+path:to:CustB
+```
+Note that the above policy grants permissions in the CustA and CustB
+compartments of the C3 but **also in the same compartment in the OCI
+tenancy**! To prevent permissions being granted in the OCI tenancy
+append a condition such as:
+
+```Allow group CustA-Admin-grp to manage all-resources in compartment
+path:to:CustA where all {request.region != 'LHR',request.region !=
+'FRA'}
+
+Allow group CustB-Admin-grp to manage all-resources in compartment
+path:to:CustB where all {request.region != 'LHR',request.region !=
+'FRA'}
+```
+In the example above the condition prevents resource creation in London
+and Frankfurt regions. Adjust the list to include all regions the
+tenancy is subscribed to.
+
+The path to the end user compartment must be explicitly stated, using
+the comma format, relative to the compartment where the policy is
+created. 
+
+## Example Policies for Customer End User
+```
+Allow group CustA-Users-grp to manage instance-family in compartment
+path:to:CustA  
+Allow group CustA-Users-grp to use volume-family in compartment
+path:to:CustA  
+Allow group CustA-Users-grp to use virtual-network-family in compartment
+path:to:CustA  
+Allow group CustB-Users-grp to manage instance-family in compartment
+path:to:CustB  
+Allow group CustB-Users-grp to use volume-family in compartment
+path:to:CustB  
+Allow group CustB-Users-grp to use virtual-network-family in compartment
+path:to:CustB
+```
+As above append a condition to limit permissions to the C3 and prevent
+resource creation in OCI regions:
+```
+Allow group CustA-Users-grp to manage instance-family in compartment
+path:to:CustA where all {request.region != 'LHR',request.region !=
+'FRA'}  
+Allow group CustA-Users-grp to use volume-family in compartment
+path:to:CustA where all {request.region != 'LHR',request.region !=
+'FRA'}  
+Allow group CustA-Users-grp to use virtual-network-family in compartment
+path:to:CustA where all {request.region != 'LHR',request.region !=
+'FRA'}  
+Allow group CustB-Users-grp to manage instance-family in compartment
+path:to:CustB where all {request.region != 'LHR',request.region !=
+'FRA'}  
+Allow group CustB-Users-grp to use volume-family in compartment
+path:to:CustB where all {request.region != 'LHR',request.region !=
+'FRA'}  
+Allow group CustB-Users-grp to use virtual-network-family in compartment
+path:to:CustB where all {request.region != 'LHR',request.region !=
+'FRA'}
+```
+## Common Policy
+
+Currently any user of a C3 needs access to certain resources located at
+the tenancy level to use IaaS resources in the web UI.
+Backup policies, tag namespaces, platform images, all reside at the
+tenancy level and need a further policy to allow normal use of C3 IaaS
+services. Note that this is a subtle difference to the behaviour on OCI. 
+
+An extra policy as below is required (where CommonGroup contains **all**
+HSP users on the C3):
+```
+allow group CommonGroup to read all-resources in tenancy where
+target.compartment.name='root-compartment-name'
+```
+