You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/patterns/rag-llm-gitops/_index.md
+17-20Lines changed: 17 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,13 +4,14 @@ date: 2024-07-25
4
4
tier: tested
5
5
summary: The goal of this demo is to showcase a Chatbot LLM application augmented with data from Red Hat product documentation running on Red Hat OpenShift. It deploys an LLM application that connects to multiple LLM providers such as OpenAI, Hugging Face, and NVIDIA NIM. The application generates a project proposal for a Red Hat product.
6
6
rh_products:
7
-
- Red Hat OpenShift Container Platform
8
-
- Red Hat OpenShift GitOps
7
+
- Red Hat OpenShift Container Platform
8
+
- Red Hat OpenShift GitOps
9
+
- Red Hat OpenShift AI
9
10
partners:
10
-
- EDB
11
-
- Elastic
11
+
- EDB
12
+
- Elastic
12
13
industries:
13
-
- General
14
+
- General
14
15
aliases: /ai/
15
16
# uncomment once this exists
16
17
# pattern_logo: retail.png
@@ -26,16 +27,15 @@ ci: ragllm
26
27
27
28
## Introduction
28
29
29
-
This deployment is based on the _validated pattern framework_, using GitOps for
30
+
This deployment is based on the _Validated Patterns framework_, using GitOps for
30
31
seamless provisioning of all operators and applications. It deploys a Chatbot
31
32
application that harnesses the power of Large Language Models (LLMs) combined
32
33
with the Retrieval-Augmented Generation (RAG) framework.
33
34
34
35
The pattern uses the [Red Hat OpenShift AI](https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai) to deploy and serve LLM models at scale.
35
36
36
-
The application uses either the [EDB Postgres for Kubernetes operator](https://catalog.redhat.com/software/container-stacks/detail/5fb41c88abd2a6f7dbe1b37b)
37
-
(default), or Redis, to store embeddings of Red Hat product documentation, running on Red Hat
38
-
OpenShift Container Platform to generate project proposals for specific Red Hat products.
37
+
The pattern provides several options for the RAG DB vector store including EDB Postgres (the default), Elasticsearch,
38
+
Redis, and Microsoft SQL Server.
39
39
40
40
## Demo Description & Architecture
41
41
@@ -47,7 +47,7 @@ The application generates a project proposal for a Red Hat product.
47
47
- Leveraging [Red Hat OpenShift AI](https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai) to deploy and serve LLM models powered by NVIDIA GPU accelerator.
48
48
- LLM Application augmented with content from Red Hat product documentation.
-**Hugging Face Text Generation Inference Server:** The pattern deploys a Hugging Face TGIS server. The server deploys `mistral-community/Mistral-7B-v0.2` model. The server will require a GPU node.
93
+
-**vLLM Inference Server:** The pattern deploys a vLLM server. The server deploys `ibm-granite/granite-3.3-8b-instruct` model. The server will require a GPU node.
96
94
-**EDB Postgres for Kubernetes / Redis Server:** A Vector Database server is deployed to store vector embeddings created from Red Hat product documentation.
97
95
-**Populate VectorDb Job:** The job creates the embeddings and populates the vector database.
98
96
-**LLM Application:** This is a Chatbot application that can generate a project proposal by augmenting the LLM with the Red Hat product documentation stored in vector db.
99
-
-**Prometheus:** Deploys a prometheus instance to store the various metrics from the LLM application and TGIS server.
97
+
-**Prometheus:** Deploys a prometheus instance to store the various metrics from the LLM application and vLLM inference server.
100
98
-**Grafana:** Deploys Grafana application to visualize the metrics.
This pattern supports two types of vector databases, EDB Postgres for Kubernetes, and Redis. By default the pattern will deploy EDB Postgres for Kubernetes as a vector database. To deploy change the global.db.type parameter to the REDIS value in your local branch in `values-global.yaml`.
9
+
This pattern supports several types of vector databases, EDB Postgres for Kubernetes, Elasticsearch, Redis, Microsoft SQL Server, and the cloud-deployed Azure SQL Server. By default the pattern will deploy EDB Postgres for Kubernetes as a vector database. To use a different vector database, change the `global.db.type` parameter to `ELASTIC`, `MSSQL` etc. in your local branch in `values-global.yaml`.
10
10
11
11
```yaml
12
-
---
13
12
global:
14
13
pattern: rag-llm-gitops
15
14
options:
16
15
useCSV: false
17
16
syncPolicy: Automatic
18
17
installPlanApproval: Automatic
19
-
# Possible value for db.type = [REDIS, EDB]
18
+
# Possible values for RAG vector DB db.type:
19
+
# REDIS -> Redis (Local chart deploy)
20
+
# EDB -> PGVector (Local chart deploy)
21
+
# ELASTIC -> Elasticsearch (Local chart deploy)
22
+
# MSSQL -> MS SQL Server (Local chart deploy)
23
+
# AZURESQL -> Azure SQL (Pre-existing in Azure)
20
24
db:
21
25
index: docs
22
26
type: EDB
23
-
# Add for model ID
27
+
# Models used by the inference service (should be a HuggingFace model ID)
This is also where you are able to update both the LLM model served by the vLLM inference service as well as the embedding model used by the vector database.
- You have the OpenShift Container Platform installation program and the pull secret for your cluster. You can get these from [Install OpenShift on AWS with installer-provisioned infrastructure](https://console.redhat.com/openshift/install/aws/installer-provisioned).
11
11
- Red Hat Openshift cluster running in AWS.
12
12
13
+
It is also possible to deploy the RAG-LLM Gitops pattern to Azure. Since these docs focus mostly on the AWS deployment, it's recommended that you reference [RAG-LLM pattern on Microsoft Azure](https://validatedpatterns.io/patterns/azure-rag-llm-gitops/)
14
+
for more details about installing this pattern on Azure.
15
+
13
16
## Procedure
14
17
15
18
1. Create the installation configuration file using the steps described in [Creating the installation configuration file](https://docs.openshift.com/container-platform/latest/installing/installing_aws/ipi/installing-aws-customizations.html#installation-initializing_installing-aws-customizations).
16
19
17
20
> **Note:**
18
21
> Supported regions are `us-east-1``us-east-2``us-west-1``us-west-2``ca-central-1``sa-east-1``eu-west-1``eu-west-2``eu-west-3``eu-central-1``eu-north-1``ap-northeast-1``ap-northeast-2``ap-northeast-3``ap-southeast-1``ap-southeast-2` and `ap-south-1`. For more information about installing on AWS see, [Installation methods](https://docs.openshift.com/container-platform/latest/installing/installing_aws/preparing-to-install-on-aws.html).
19
-
>
20
22
21
23
2. Customize the generated `install-config.yaml` creating one control plane node with instance type `m5.2xlarge` and 3 worker nodes with instance type `m5.2xlarge`. A sample YAML file is shown here:
8. Create a local branch by running the following command:
89
96
90
97
```sh
91
98
$ git checkout -b my-test-branch main
92
99
```
93
100
94
-
9. By default the pattern deploys the EDB Postgres for Kubernetes as a vector database. To deploy Redis, change the `global.db.type` parameter to the `REDIS` value in your local branch in `values-global.yaml`. For more information see, [Deploying a different databases](/rag-llm-gitops/deploy-different-db/) to change the vector database.
101
+
9. By default the pattern deploys the EDB Postgres for Kubernetes as a vector database. To deploy Elasticsearch, change the `global.db.type` parameter to the `ELASTIC` value in your local branch in `values-global.yaml`. For more information see, [Deploying a different databases](/rag-llm-gitops/deploy-different-db/) to change the vector database.
95
102
96
103
10. By default instance types for the GPU nodes are `g5.2xlarge`. Follow the [Customize GPU provisioning nodes](/rag-llm-gitops/gpuprovisioning/) to change the GPU instance types.
12. Ensure you have logged in to the cluster at both command line and the console by using the login credentials presented to you when you installed the cluster. For example:
> This deploys everything you need to run the demo application including the Nividia GPU Operator and the Node Feature Discovery Operator used to determine your GPU nodes.
- Click the `Generate` button, a project proposal should be generated. The project proposal also contains the reference of the RAG content. The project proposal document can be Downloaded in the form of a PDF document.
0 commit comments