Merge pull request #603 from dminnear-rh/update-rag-llm-docs

gaurav-nelson · web-flow · commit dc60abb189e6 · 2025-09-18T11:05:37.000+10:00
add information about elasticsearch to rag-llm docs
diff --git a/content/patterns/rag-llm-gitops/_index.md b/content/patterns/rag-llm-gitops/_index.md
@@ -4,13 +4,14 @@ date: 2024-07-25
 tier: tested
 summary: The goal of this demo is to showcase a Chatbot LLM application augmented with data from Red Hat product documentation running on Red Hat OpenShift. It deploys an LLM application that connects to multiple LLM providers such as OpenAI, Hugging Face, and NVIDIA NIM. The application generates a project proposal for a Red Hat product.
 rh_products:
-- Red Hat OpenShift Container Platform
-- Red Hat OpenShift GitOps
+  - Red Hat OpenShift Container Platform
+  - Red Hat OpenShift GitOps
+  - Red Hat OpenShift AI
 partners:
-- EDB
-- Elastic
+  - EDB
+  - Elastic
 industries:
-- General
+  - General
 aliases: /ai/
 # uncomment once this exists
 # pattern_logo: retail.png
@@ -26,16 +27,15 @@ ci: ragllm
 
 ## Introduction
 
-This deployment is based on the _validated pattern framework_, using GitOps for
+This deployment is based on the _Validated Patterns framework_, using GitOps for
 seamless provisioning of all operators and applications. It deploys a Chatbot
 application that harnesses the power of Large Language Models (LLMs) combined
 with the Retrieval-Augmented Generation (RAG) framework.
 
 The pattern uses the [Red Hat OpenShift AI](https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai) to deploy and serve LLM models at scale.
 
-The application uses either the [EDB Postgres for Kubernetes operator](https://catalog.redhat.com/software/container-stacks/detail/5fb41c88abd2a6f7dbe1b37b)
-(default), or Redis, to store embeddings of Red Hat product documentation, running on Red Hat
-OpenShift Container Platform to generate project proposals for specific Red Hat products.
+The pattern provides several options for the RAG DB vector store including EDB Postgres (the default), Elasticsearch,
+Redis, and Microsoft SQL Server.
 
 ## Demo Description & Architecture
 
@@ -47,7 +47,7 @@ The application generates a project proposal for a Red Hat product.
 - Leveraging [Red Hat OpenShift AI](https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai) to deploy and serve LLM models powered by NVIDIA GPU accelerator.
 - LLM Application augmented with content from Red Hat product documentation.
 - Multiple LLM providers (OpenAI, Hugging Face, NVIDIA).
-- Vector Database, such as EDB Postgres for Kubernetes, or Redis, to store embeddings of Red Hat product documentation.
+- Vector Database, such as EDB Postgres, Elasticsearch, or Microsoft SQL Server to store embeddings of Red Hat product documentation.
 - Monitoring dashboard to provide key metrics such as ratings.
 - GitOps setup to deploy e2e demo (frontend / vector database / served models).
 
@@ -57,31 +57,29 @@ The application generates a project proposal for a Red Hat product.
 
 _Figure 3. Schematic diagram for workflow of RAG demo with Red Hat OpenShift._
 
-
 #### RAG Data Ingestion
 
 ![ingestion](https://gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/images/schematic-diagrams/rag-demo-vp-ingress-sd.png)
 
 _Figure 4. Schematic diagram for Ingestion of data for RAG._
 
-
 #### RAG Augmented Query
 
-
-![query](https://gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/images/schematic-diagrams/rag-demo-vp-query-sd.png)
+![query](/images/rag-llm-gitops/rag-augmented-query.png)
 
 _Figure 5. Schematic diagram for RAG demo augmented query._
 
-In Figure 5, we can see RAG augmented query. The Mistral-7B model is used for
+In Figure 5, we can see RAG augmented query. The `granite-3.3-8b-instruct` model is used for
 language processing. LangChain is used to integrate different tools of the LLM-based
 application together and to process the PDF files and web pages. A vector
-database provider such as EDB Postgres for Kubernetes (or Redis), is used to
-store vectors. HuggingFace TGI is used to serve the Mistral-7B model. Gradio is
+database provider such as EDB Postgres for Kubernetes (or Elasticsearch), is used to
+store vectors. vLLM is used to serve the `granite-3.3-8b-instruct` model. Gradio is
 used for user interface and object storage to store language model and other
 datasets. Solution components are deployed as microservices in the Red Hat
 OpenShift Container Platform cluster.
 
 #### Download diagrams
+
 View and download all of the diagrams above in our open source tooling site.
 
 [Open Diagrams](https://www.redhat.com/architect/portfolio/tool/index.html?#gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/diagrams/rag-demo-vp.drawio)
@@ -92,14 +90,13 @@ _Figure 6. Proposed demo architecture with OpenShift AI_
 
 ### Components deployed
 
-- **Hugging Face Text Generation Inference Server:** The pattern deploys a Hugging Face TGIS server. The server deploys `mistral-community/Mistral-7B-v0.2` model. The server will require a GPU node.
+- **vLLM Inference Server:** The pattern deploys a vLLM server. The server deploys `ibm-granite/granite-3.3-8b-instruct` model. The server will require a GPU node.
 - **EDB Postgres for Kubernetes / Redis Server:** A Vector Database server is deployed to store vector embeddings created from Red Hat product documentation.
 - **Populate VectorDb Job:** The job creates the embeddings and populates the vector database.
 - **LLM Application:** This is a Chatbot application that can generate a project proposal by augmenting the LLM with the Red Hat product documentation stored in vector db.
-- **Prometheus:** Deploys a prometheus instance to store the various metrics from the LLM application and TGIS server.
+- **Prometheus:** Deploys a prometheus instance to store the various metrics from the LLM application and vLLM inference server.
 - **Grafana:** Deploys Grafana application to visualize the metrics.
 
-
 ![Overview](https://gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/images/intro-marketectures/rag-demo-vp-marketing-slide.png)
 
 _Figure 1. Overview of the validated pattern for RAG Demo with Red Hat OpenShift_
diff --git a/content/patterns/rag-llm-gitops/deploying-different-db.md b/content/patterns/rag-llm-gitops/deploying-different-db.md
@@ -1,32 +1,41 @@
 ---
-title: Deploying a different database 
+title: Deploying a different database
 weight: 12
 aliases: /rag-llm-gitops/deploy-different-db/
 ---
 
 # Deploying a different database
 
-This pattern supports two types of vector databases, EDB Postgres for Kubernetes, and Redis. By default the pattern will deploy EDB Postgres for Kubernetes as a vector database. To deploy change the global.db.type parameter to the REDIS value in your local branch in `values-global.yaml`.
+This pattern supports several types of vector databases, EDB Postgres for Kubernetes, Elasticsearch, Redis, Microsoft SQL Server, and the cloud-deployed Azure SQL Server. By default the pattern will deploy EDB Postgres for Kubernetes as a vector database. To use a different vector database, change the `global.db.type` parameter to `ELASTIC`, `MSSQL` etc. in your local branch in `values-global.yaml`.
 
 ```yaml
----
 global:
   pattern: rag-llm-gitops
   options:
     useCSV: false
     syncPolicy: Automatic
     installPlanApproval: Automatic
-# Possible value for db.type = [REDIS, EDB]
+  # Possible values for RAG vector DB db.type:
+  #   REDIS    -> Redis (Local chart deploy)
+  #   EDB      -> PGVector (Local chart deploy)
+  #   ELASTIC  -> Elasticsearch (Local chart deploy)
+  #   MSSQL    -> MS SQL Server (Local chart deploy)
+  #   AZURESQL -> Azure SQL (Pre-existing in Azure)
   db:
     index: docs
     type: EDB
-# Add for model ID
+  # Models used by the inference service (should be a HuggingFace model ID)
   model:
-      modelId: mistral-community/Mistral-7B-Instruct-v0.3
+    vllm: ibm-granite/granite-3.3-8b-instruct
+    embedding: sentence-transformers/all-mpnet-base-v2
+
+  storageClass: gp3-csi
+
 main:
   clusterGroupName: hub
   multiSourceConfig:
     enabled: true
+    clusterGroupChartVersion: 0.9.*
 ```
 
-
+This is also where you are able to update both the LLM model served by the vLLM inference service as well as the embedding model used by the vector database.
diff --git a/content/patterns/rag-llm-gitops/getting-started.md b/content/patterns/rag-llm-gitops/getting-started.md
@@ -10,27 +10,30 @@ aliases: /rag-llm-gitops/getting-started/
 - You have the OpenShift Container Platform installation program and the pull secret for your cluster. You can get these from [Install OpenShift on AWS with installer-provisioned infrastructure](https://console.redhat.com/openshift/install/aws/installer-provisioned).
 - Red Hat Openshift cluster running in AWS.
 
+It is also possible to deploy the RAG-LLM Gitops pattern to Azure. Since these docs focus mostly on the AWS deployment, it's recommended that you reference [RAG-LLM pattern on Microsoft Azure](https://validatedpatterns.io/patterns/azure-rag-llm-gitops/)
+for more details about installing this pattern on Azure.
+
 ## Procedure
 
 1. Create the installation configuration file using the steps described in [Creating the installation configuration file](https://docs.openshift.com/container-platform/latest/installing/installing_aws/ipi/installing-aws-customizations.html#installation-initializing_installing-aws-customizations).
 
    > **Note:**
    > Supported regions are `us-east-1` `us-east-2` `us-west-1` `us-west-2` `ca-central-1` `sa-east-1` `eu-west-1` `eu-west-2` `eu-west-3` `eu-central-1` `eu-north-1` `ap-northeast-1` `ap-northeast-2` `ap-northeast-3` `ap-southeast-1` `ap-southeast-2` and `ap-south-1`. For more information about installing on AWS see, [Installation methods](https://docs.openshift.com/container-platform/latest/installing/installing_aws/preparing-to-install-on-aws.html).
-   >
 
 2. Customize the generated `install-config.yaml` creating one control plane node with instance type `m5.2xlarge` and 3 worker nodes with instance type `m5.2xlarge`. A sample YAML file is shown here:
+
    ```yaml
    additionalTrustBundlePolicy: Proxyonly
    apiVersion: v1
    baseDomain: aws.validatedpatterns.io
    compute:
-   - architecture: amd64
-     hyperthreading: Enabled
-     name: worker
-     platform:
-      aws:
-        type: m5.2xlarge
-     replicas: 3
+     - architecture: amd64
+       hyperthreading: Enabled
+       name: worker
+       platform:
+         aws:
+           type: m5.2xlarge
+       replicas: 3
    controlPlane:
      architecture: amd64
      hyperthreading: Enabled
@@ -44,18 +47,18 @@ aliases: /rag-llm-gitops/getting-started/
      name: kevstestcluster
    networking:
      clusterNetwork:
-     - cidr: 10.128.0.0/14
-       hostPrefix: 23
+       - cidr: 10.128.0.0/14
+         hostPrefix: 23
      machineNetwork:
-     - cidr: 10.0.0.0/16
+       - cidr: 10.0.0.0/16
      networkType: OVNKubernetes
      serviceNetwork:
-     - 172.30.0.0/16
+       - 172.30.0.0/16
      platform:
        aws:
          region: us-east-1
      publish: External
-     pullSecret: '<pull-secret>'
+     pullSecret: "<pull-secret>"
      sshKey: |
        ssh-ed25519 <public-key> someuser@redhat.com
    ```
@@ -67,31 +70,35 @@ aliases: /rag-llm-gitops/getting-started/
    ```sh
    $ git clone git@github.com:your-username/rag-llm-gitops.git
    ```
+
 5. Go to your repository: Ensure you are in the root directory of your git repository by using the following command:
 
    ```sh
    $ cd rag-llm-gitops
    ```
+
 6. Create a local copy of the secret values file by running the following command:
 
    ```sh
    $ cp values-secret.yaml.template ~/values-secret-rag-llm-gitops.yaml
    ```
+
    > **Note:**
-   >For this demo, editing this file is unnecessary as the default configuration works out of the box upon installation.
+   > For this demo, editing this file is unnecessary as the default configuration works out of the box upon installation.
 
 7. Add the remote upstream repository by running the following command:
 
    ```sh
    $ git remote add -f upstream git@github.com:validatedpatterns/rag-llm-gitops.git
    ```
+
 8. Create a local branch by running the following command:
 
    ```sh
    $ git checkout -b my-test-branch main
    ```
 
-9. By default the pattern deploys the EDB Postgres for Kubernetes as a vector database. To deploy Redis, change the `global.db.type` parameter to the `REDIS` value in your local branch in `values-global.yaml`. For more information see, [Deploying a different databases](/rag-llm-gitops/deploy-different-db/) to change the vector database.
+9. By default the pattern deploys the EDB Postgres for Kubernetes as a vector database. To deploy Elasticsearch, change the `global.db.type` parameter to the `ELASTIC` value in your local branch in `values-global.yaml`. For more information see, [Deploying a different databases](/rag-llm-gitops/deploy-different-db/) to change the vector database.
 
 10. By default instance types for the GPU nodes are `g5.2xlarge`. Follow the [Customize GPU provisioning nodes](/rag-llm-gitops/gpuprovisioning/) to change the GPU instance types.
 
@@ -100,6 +107,7 @@ aliases: /rag-llm-gitops/getting-started/
     ```sh
     $ git push origin my-test-branch
     ```
+
 12. Ensure you have logged in to the cluster at both command line and the console by using the login credentials presented to you when you installed the cluster. For example:
 
     ```sh
@@ -109,18 +117,24 @@ aliases: /rag-llm-gitops/getting-started/
     INFO Access the OpenShift web-console here: https://console-openshift-console.apps.demo1.openshift4-beta-abcorp.com
     INFO Login to the console with user: kubeadmin, password: <provided>
     ```
+
 13. Add GPU nodes to your existing cluster deployment by running the following command:
 
     ```sh
     $ ./pattern.sh make create-gpu-machineset
     ```
+
     > **Note:**
     > You may need to create a file `config` in your home directory and populate it with the region name.
+    >
     > 1. Run the following:
+    >
     > ```sh
     > vi ~/.aws/config
     > ```
+    >
     > 2. Add the following:
+    >
     > ```sh
     > [default]
     > region = us-east-1
@@ -136,7 +150,6 @@ aliases: /rag-llm-gitops/getting-started/
 
     > **Note:**
     > This deploys everything you need to run the demo application including the Nividia GPU Operator and the Node Feature Discovery Operator used to determine your GPU nodes.
-    >
 
 ## Verify the Installation
 
@@ -167,5 +180,3 @@ aliases: /rag-llm-gitops/getting-started/
 - Click the `Generate` button, a project proposal should be generated. The project proposal also contains the reference of the RAG content. The project proposal document can be Downloaded in the form of a PDF document.
 
   ![Routes](/images/rag-llm-gitops/proposal.png)
-
-
diff --git a/static/images/rag-llm-gitops/rag-augmented-query.png b/static/images/rag-llm-gitops/rag-augmented-query.png