Skip to content

Commit abe24be

Browse files
Merge branch 'main' into dependabot/pip/cloud-infrastructure/ai-infra-gpu/ai-infrastructure/rag-langchain-vllm-mistral/files/langchain-community-0.2.9
2 parents 05202a1 + 38f36e7 commit abe24be

File tree

189 files changed

+4654
-2421
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

189 files changed

+4654
-2421
lines changed

app-dev/devops-and-containers/oke/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,6 @@ Reviewed: 20.12.2023
1414

1515
- [Cloud Coaching - Deploy Microservices with Kubernetes (OKE)](https://www.youtube.com/watch?v=mu5jbFjKKn0)
1616
- [Cloud Coaching - OCI Observability for Kubernetes monitoring](https://www.youtube.com/watch?v=mu5jbFjKKn0)
17-
- [Disaster Recovery — Notes on Velero and OKE, Part 1: Stateless Pods](https://medium.com/oracledevs/disaster-recovery-notes-on-velero-and-oke-part-1-stateless-pods-b4ba3e737386)
1817
- [Advanced Kubernetes Networking: OKE in a Hub-Spoke Architectures](https://medium.com/oracledevs/advanced-kubernetes-networking-oke-in-a-hub-spoke-architectures-f0ba2256e824)
1918
- [Scale and optimize Jenkins on Oracle Cloud Infrastructure Container Engine for Kubernetes](https://docs.oracle.com/en/solutions/oci-jenkins-oke/index.html#GUID-23A8EB94-DFFC-4D5C-897F-5F59423447D2)
2019
- [Argo Workflow on OKE for limitless ML](https://www.youtube.com/watch?v=HOWrwBVuLp0)
@@ -40,7 +39,8 @@ Reviewed: 20.12.2023
4039
- [Disaster Recovery — Notes on Velero and OKE, Part 1: Stateless Pods](https://medium.com/oracledevs/disaster-recovery-notes-on-velero-and-oke-part-1-stateless-pods-b4ba3e737386)
4140
- [Disaster Recovery — Notes on Velero and OKE, Part 2: Stateful Pods with Persistent Volumes and Block Volume](https://medium.com/oracledevs/disaster-recovery-notes-on-velero-and-oke-part-2-stateful-pods-with-persistent-volumes-and-80204b3ac6d7)
4241
- [Disaster Recovery: Notes on Velero and OKE — part 3: Stateful Pods with Persistent Volumes and File Storage](https://medium.com/oracledevs/oke-disaster-recovery-notes-on-velero-and-oke-part-3-stateful-pods-with-persistent-volumes-and-a6eacef7600b)
43-
- [Test S3 Compatibility - Preparing Backups and DR for OKE and Velero](https://github.com/fharris/oci-s3-compatibility)
42+
- [Authentication with OAuth2-Proxy, Kubernetes and OCI](https://medium.com/oracledevs/authentication-with-oauth2-proxy-kubernetes-and-oci-6c8d87769184)
43+
- [Code for Authentication with OAuth2-Proxy Kubernetes and OCI](https://github.com/fharris/oauth2-proxy-demo)
4444

4545

4646
# Useful Links

cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ These resources aim to offer guidance throughout your migration, enabling you to
66

77
Explore these materials to enhance your migration strategy. We appreciate your participation and are committed to supporting your cloud migration journey.
88

9-
Reviewed: 7.2.2024
9+
Reviewed: 22.7.2024
1010

1111
# Table of Contents
1212

@@ -18,8 +18,9 @@ Reviewed: 7.2.2024
1818

1919
# Team Publications
2020

21-
- [Cyber recovery solution on Oracle Cloud Infrastructure](https://docs.oracle.com/en/solutions/oci-automated-cyber-recovery/index.html)
22-
21+
- [Automate Recovery for Oracle Enterprise Performance Management using OCI Full Stack Disaster Recovery](https://docs.oracle.com/en/learn/fsdr-integration-epm/)
22+
- [Cyber recovery solution on Oracle Cloud Infrastructure](https://docs.oracle.com/en/solutions/oci-automated-cyber-recovery/index.html)
23+
2324
# Useful Links
2425

2526
- [EPM System Release 11.2.17 announcement](https://blogs.oracle.com/proactivesupportepm/post/enterprise-performance-management-epm-11217-is-available)

cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/essbase-discovery-questionnaire/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This document serves as a standard questionnaire designed to gather crucial information necessary for the execution of Essbase application migration projects. It captures specific data that aids in estimating the effort required for a successful migration.
44

5-
Reviewed: 7.2.2024
5+
Reviewed: 22.7.2024
66

77
# When to use this asset?
88

cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/essbase-solution-definition/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ This document serves as an integral asset for individuals and organizations seek
1212

1313
Use this document as a starting point for the solution definition of your Essbase implementation project. This asset includes example architecture diagrams for DrawIO in the file essbase-architecture-diagrams-example.drawio.
1414

15-
Reviewed: 19.4.2024
15+
Reviewed: 22.7.2024
1616

1717
# Conclusion
1818
The Essbase Workload Solution Definition is expected to serve as a definitive guide to the project. All participants are encouraged to provide feedback, raise queries, and make contributions to enhance the overall project's success.

cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-architecture-diagrams/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,9 @@ They serve as a helpful resource for defining solutions, preparing designs, unde
88

99
For a more professional and consistent presentation, these diagrams use the official OCI icon pack for draw.io. You can download the icons pack from the official Oracle page [here](https://docs.oracle.com/en-us/iaas/Content/General/Reference/graphicsfordiagrams.htm)
1010

11-
Reviewed: 7.2.2024
11+
Hyperion EPM System Reference architecture on OCI can be found in the [Architecture Center](https://docs.oracle.com/en/solutions/deploy-hyperion-oci/index.html)
12+
13+
Reviewed: 22.7.2024
1214

1315
# Contents
1416

cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-discovery-questionnaire/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This document serves as a standard questionnaire designed to gather crucial information necessary for the execution of Hyperion and Essbase application migration projects. It captures specific data that aids in estimating the effort required for a successful migration.
44

5-
Reviewed: 7.2.2024
5+
Reviewed: 22.7.2024
66

77
# When to use this asset?
88

cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-essbase-decision-tree/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This GitHub repository hosts a decision path designed to guide you through the process of upgrading of Hyperion EPM System and Essbase or migrating these products to Oracle Cloud Infrastructure (OCI).
44

5-
Reviewed: 7.2.2024
5+
Reviewed: 22.7.2024
66

77
# When to use this asset?
88

cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-fsdr/README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,20 @@ This GitHub repository provides custom scripts that serve as a starting point fo
55
Included scripts:
66
- start_services.ps1/sh - script to start all EPM System services, including WLS and OHS, on Windows (PowerShell) or Linux (Bash) compute
77
- stop_services.ps1/sh - script to start all EPM System services, including WLS and OHS, on Windows (PowerShell) or Linux (Bash) compute
8-
- host_switch_failover.ps1/sh - script to update host file after switch to the standby region. Windows (PowerShell) or Linux (Bash).
9-
- host_switch_failback.ps1/sh - script to update host file after switch from standby region back to the primary region. Windows (PowerShell) or Linux (Bash).
8+
- host_switch_failover.ps1/sh - script to update the host file after switching to the standby region. Windows (PowerShell) or Linux (Bash) script to be used in a user-defined plan group after starting the compute nodes in the standby region.
9+
- host_switch_failback.ps1/sh - script to update the host file after switching from the standby region back to the primary region. Windows (PowerShell) or Linux (Bash) to be used in a user-defined plan group after starting the compute nodes in the primary region.
1010

11-
Reviewed: 6.6.2024
11+
The complete tutorial is available here: [Automate Recovery for Oracle Enterprise Performance Management using OCI Full Stack Disaster Recovery](https://docs.oracle.com/en/learn/fsdr-integration-epm/)
12+
13+
Reviewed: 22.7.2024
1214

1315
# When to use this asset?
1416

1517
Use these scripts to customize your Full Stack Disaster Recovery plans and automate switchovers and failovers between OCI regions for EPM System applications.
1618

1719
# How to use this asset?
1820

19-
Use these scripts in FSDR user defined plan groups [link](https://docs.oracle.com/en-us/iaas/disaster-recovery/doc/add-user-defined-plan-groups.html)
21+
Use these scripts in FSDR user-defined plan groups [link](https://docs.oracle.com/en-us/iaas/disaster-recovery/doc/add-user-defined-plan-groups.html)
2022

2123
# Useful Links
2224

cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-solution-definition/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This repository contains an in-depth guide for Oracle Hyperion migration projects. It offers a high-level solution definition for migrating or establishing Hyperion Workloads on Oracle Cloud Infrastructure (OCI). With a comprehensive representation of the current state, prospective state, potential project scope, and anticipated timeline, this document aims to provide a precise understanding of the project's scope and intention to all participating entities.
44

5-
Reviewed date: 19.4.2024
5+
Reviewed date: 22.7.2024
66

77
# When to use this asset?
88

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# Calling multiple vLLM inference servers using LiteLLM
2+
3+
In this tutorial we explain how to use a LiteLLM Proxy Server to call multiple LLM inference endpoints from a single interface. LiteLLM interacts will 100+ LLMs such as OpenAI, Cohere, NVIDIA Triton and NIM, etc. Here we will use two vLLM inference servers.
4+
5+
<!-- ![Hybrid shards](assets/images/litellm.png "LiteLLM") -->
6+
7+
# When to use this asset?
8+
9+
To run the inference tutorial with local deployments of Mistral 7B Instruct v0.3 using a vLLM inference server powered by an NVIDIA A10 GPU and a LiteLLM Proxy Server on top.
10+
11+
# How to use this asset?
12+
13+
These are the prerequisites to run this tutorial:
14+
* An OCI tenancy with A10 quota
15+
* A Huggingface account with a valid Auth Token
16+
* A valid OpenAI API Key
17+
18+
## Introduction
19+
20+
LiteLLM provides a proxy server to manage auth, loadbalancing, and spend tracking across 100+ LLMs. All in the OpenAI format.
21+
vLLM is a fast and easy-to-use library for LLM inference and serving.
22+
The first step will be to deploy two vLLM inference servers on NVIDIA A10 powered virtual machine instances. In the second step, we will create a LiteLLM Proxy Server on a third no-GPU instance and explain how we can use this interface to call the two LLM from a single location. For the sake of simplicity, all 3 instances will reside in the same public subnet here.
23+
24+
![Hybrid shards](assets/images/litellm-architecture.png "LiteLLM")
25+
26+
## vLLM inference servers deployment
27+
28+
For each of the inference nodes a VM.GPU.A10.2 instance (2 x NVIDIA A10 GPU 24GB) is used in combination with the NVIDIA GPU-Optimized VMI image from the OCI marketplace. This Ubuntu-based image comes with all the necessary libraries (Docker, NVIDIA Container Toolkit) preinstalled. It is a good practice to deploy two instances in two different fault domains to ensure a higher availability.
29+
30+
The vLLM inference server is deployed using the vLLM official container image.
31+
```
32+
docker run --gpus all \
33+
-e HF_TOKEN=$HF_TOKEN -p 8000:8000 \
34+
--ipc=host \
35+
vllm/vllm-openai:latest \
36+
--host 0.0.0.0 \
37+
--port 8000 \
38+
--model mistralai/Mistral-7B-Instruct-v0.3 \
39+
--tensor-parallel-size 2 \
40+
--load-format safetensors \
41+
--trust-remote-code \
42+
--enforce-eager
43+
```
44+
where `$HF_TOKEN` is a valid HuggingFace token. In this case we use the 7B Instruct version of Mistral LLM. The vLLM endpoint can be directly called for verification with:
45+
```
46+
curl http://localhost:8000/v1/chat/completions \
47+
-H "Content-Type: application/json" \
48+
-d '{
49+
"model": "mistralai/Mistral-7B-Instruct-v0.3",
50+
"messages": [
51+
{"role": "user", "content": "Who won the world series in 2020?"}
52+
]
53+
}' | jq
54+
```
55+
56+
## LiteLLM server deployment
57+
58+
No GPU are required for LiteLLM. Therefore, a CPU based VM.Standard.E4.Flex instance (4 OCPUs, 64 GB Memory) with a standard Ubuntu 22.04 image is used. Here LiteLLM is used as a proxy server calling a vLLM endpoint. Install LiteLLM using `pip`:
59+
```
60+
pip install 'litellm[proxy]'
61+
```
62+
Edit the `config.yaml` file (OpenAI-Compatible Endpoint):
63+
```
64+
model_list:
65+
- model_name: Mistral-7B-Instruct
66+
litellm_params:
67+
model: openai/mistralai/Mistral-7B-Instruct-v0.3
68+
api_base: http://xxx.xxx.xxx.xxx:8000/v1
69+
api_key: sk-0123456789
70+
- model_name: Mistral-7B-Instruct
71+
litellm_params:
72+
model: openai/mistralai/Mistral-7B-Instruct-v0.3
73+
api_base: http://xxx.xxx.xxx.xxx:8000/v1
74+
api_key: sk-0123456789
75+
```
76+
where `sk-0123456789` is a valid OpenAI API key and `xxx.xxx.xxx.xxx` are the two GPU instances public IP addresses.
77+
78+
Start the LiteLLM Proxy Server with the following command:
79+
```
80+
litellm --config /path/to/config.yaml
81+
```
82+
Once the the Proxy Server is ready call the vLLM endpoint through LiteLLM with:
83+
```
84+
curl http://localhost:4000/chat/completions \
85+
-H 'Authorization: Bearer sk-0123456789' \
86+
-H "Content-Type: application/json" \
87+
-d '{
88+
"model": "Mistral-7B-Instruct",
89+
"messages": [
90+
{"role": "user", "content": "Who won the world series in 2020?"}
91+
]
92+
}' | jq
93+
```
94+
95+
## Documentation
96+
97+
* [LiteLLM documentation](https://litellm.vercel.app/docs/providers/openai_compatible)
98+
* [vLLM documentation](https://docs.vllm.ai/en/latest/serving/deploying_with_docker.html)
99+
* [MistralAI](https://mistral.ai/)

0 commit comments

Comments
 (0)