added vector database to AI Pod workshop

dmitchsplunk · dmitchsplunk · commit f11984e40c96 · 2025-09-29T17:16:13.000-07:00
diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/6-deploy-llm.md b/content/en/ninja-workshops/14-cisco-ai-pods/6-deploy-llm.md
@@ -177,5 +177,53 @@ curl -X "POST" \
 }
 ```
 
+{{% /tab %}}
+{{< /tabs >}}
+
+## Deploy an Embeddings Model
+
+We're also going to deploy an [embeddings model](https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2/deploy) 
+in our cluster, which will be used later in the workshop to implement Retrieval Augmented Generation (RAG). 
+
+Run the following command to deploy the embeddings model: 
+
+``` bash
+oc apply -n nim-service -f nvidia-embeddings.yaml
+```
+
+## Test the Embeddings Model
+
+Let's ensure the embeddings is working as expected.
+
+Start a pod that has access to the curl command:
+
+``` bash
+oc run --rm -it -n default curl --image=curlimages/curl:latest -- sh
+```
+
+Then run the following command to send a prompt to the LLM:
+
+{{< tabs >}}
+{{% tab title="Script" %}}
+
+``` bash
+curl -X "POST" \
+ 'http://llama-32-nv-embedqa-1b-v2.nim-service:8000/v1/embeddings"' \
+  -H 'Accept: application/json' \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "model": "nvidia/llama-3.2-nv-embedqa-1b-v2",
+    "input": ["Hello world"],
+    "input_type": "query"
+  }'
+```
+
+{{% /tab %}}
+{{% tab title="Example Output" %}}
+
+``` bash
+TBD
+```
+
 {{% /tab %}}
 {{< /tabs >}}
diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/8-deploy-vector-db.md b/content/en/ninja-workshops/14-cisco-ai-pods/8-deploy-vector-db.md
@@ -0,0 +1,207 @@
+---
+title: Deploy the Vector Database
+linkTitle: 8. Deploy the Vector Database
+weight: 8
+time: 10 minutes
+---
+
+In this step, we'll deploy a vector database to the AI POD and populate it with 
+test data. 
+
+## What is a Vector Database? 
+
+A vector database stores and indexes data as numerical "vector embeddings," which capture
+the semantic meaning of information like text or images. Unlike traditional databases,
+they excel at "similarity searches," finding conceptually related data points rather
+than exact matches.
+
+## How is a Vector Database Used? 
+
+Vector databases play a key role in a pattern called
+Retrieval Augmented Generation (RAG), which is widely used by 
+applications that leverage Large Language Models (LLMs). 
+
+The pattern is as follows: 
+
+* The end-user asks a question to the application 
+* The application takes the question and calculates a vector embedding for it
+* The app then performs a similarity search, looking for related documents in the vector database
+* The app then takes the original question and the related documents, and sends it to the LLM as context 
+* The LLM reviews the context and returns a response to the application 
+
+## Deploy a Vector Database 
+
+For the workshop, we'll deploy an open-source vector database named 
+[Weaviate](https://weaviate.io/). 
+
+First, add the Weaviate helm repo that contains the Weaviate helm chart: 
+
+``` bash
+helm repo add weaviate https://weaviate.github.io/weaviate-helm
+helm repo update
+```
+
+The `weaviate/weaviate-values.yaml` file includes the configuration we'll use to deploy 
+the Weviate vector database.  
+
+We've set the following environment variables to `TRUE`, to ensure Weaviate exposes 
+metrics that we can scrape later with the Prometheus receiver: 
+
+````
+  PROMETHEUS_MONITORING_ENABLED: true
+  PROMETHEUS_MONITORING_GROUP: true
+````
+
+Review [Weaviate documentation](https://docs.weaviate.io/deploy/installation-guides/k8s-installation) 
+to explore additional customization options available. 
+
+Let's create a new namespace: 
+
+``` bash
+oc create namespace weaviate
+```
+
+Then deploy Weaviate: 
+
+``` bash
+helm upgrade --install \
+  "weaviate" \
+  weaviate/weaviate \
+  --namespace "weaviate" \
+  --values ./weaviate/weaviate-values.yaml
+```
+
+## Capture Weaviate Metrics with Prometheus
+
+Now that Weaviate is installed in our OpenShift cluster, let's modify the 
+OpenTelemetry collector configuration to scrape Weaviate's Prometheus 
+metrics. 
+
+To do so, let's add an additional Prometheus receiver to the `otel-collector-values.yaml` file: 
+
+``` yaml
+          prometheus/weaviate:
+            config:
+              config:
+                scrape_configs:
+                  - job_name: weaviate-metrics
+                    scrape_interval: 10s
+                    static_configs:
+                      - targets:
+                          - '`endpoint`:2112'
+            rule: type == "pod" && labels["app"] == "weaviate"
+```
+
+We'll need to ensure that Weaviate's metrics are added to the `filter/metrics_to_be_included` filter 
+processor configuration as well: 
+
+``` yaml
+    processors:
+      filter/metrics_to_be_included:
+        metrics:
+          # Include only metrics used in charts and detectors
+          include:
+            match_type: strict
+            metric_names:
+              - DCGM_FI_DEV_FB_FREE
+              - ...
+              - object_count
+              - vector_index_size
+              - vector_index_operations
+              - vector_index_tombstones
+              - vector_index_tombstone_cleanup_threads
+              - vector_index_tombstone_cleanup_threads
+              - requests_total
+              - objects_durations_ms_sum
+              - objects_durations_ms_count
+              - batch_delete_durations_ms_sum
+              - batch_delete_durations_ms_count
+```
+
+We also want to add a Resource processor to the configuration file, with the following configuration: 
+
+``` yaml
+      resource/weaviate:
+        attributes:
+          - key: weaviate.instance.id
+            from_attribute: service.instance.id
+            action: insert
+```
+
+This processor takes the `service.instance.id` property on the Weaviate metrics 
+and copies it into a new property called `weaviate.instance.id`. This is done so
+that we can more easily distinguish Weaviate metrics from other metrics that use 
+`service.instance.id`, which is a standard OpenTelemetry property used in 
+Splunk Observability Cloud. 
+
+We'll need to add this Resource processor to the metrics pipeline as well: 
+
+``` yaml
+    service:
+      pipelines:
+        metrics/nvidia-metrics:
+          exporters:
+            - signalfx
+          processors:
+            - memory_limiter
+            - filter/metrics_to_be_included
+            - resource/weaviate
+            - batch
+            - resourcedetection
+            - resource
+          receivers:
+            - receiver_creator/nvidia
+```
+
+Before applying the configuration changes to the collector, take a moment to compare the
+contents of your modified `otel-collector-values.yaml` file with the 
+`otel-collector-values-with-weaviate.yaml` file.
+Update your file as needed to ensure the contents match.  Remember that indentation is important
+for `yaml` files, and needs to be precise.
+
+Now we can update the OpenTelemetry collector configuration by running the
+following Helm command:
+
+``` bash
+helm upgrade splunk-otel-collector \
+  --set="clusterName=$CLUSTER_NAME" \
+  --set="environment=$ENVIRONMENT_NAME" \
+  --set="splunkObservability.accessToken=$SPLUNK_ACCESS_TOKEN" \
+  --set="splunkObservability.realm=$SPLUNK_REALM" \
+  --set="splunkPlatform.endpoint=$SPLUNK_HEC_URL" \
+  --set="splunkPlatform.token=$SPLUNK_HEC_TOKEN" \
+  --set="splunkPlatform.index=$SPLUNK_INDEX" \
+  -f ./otel-collector/otel-collector-values.yaml \
+  -n otel \
+  splunk-otel-collector-chart/splunk-otel-collector
+```
+
+In Splunk Observability Cloud, navigate to `Infrastructure` -> `AI Frameworks` -> `Weaviate`. 
+Filter on the `k8s.cluster.name` of interest, and ensure the navigator is populated as in the 
+following example: 
+
+![Kubernetes Pods](../images/WeaviateNavigator.png)
+
+## Populate the Vector Database 
+
+Now that Weaviate is up and running, and we're capturing metrics from it 
+to ensure it's healthy, let's add some data to it that we'll use in the next part 
+of the workshop with a custom application. 
+
+The application used to do this is based on 
+[LangChain Playbook for NeMo Retriever Text Embedding NIM](https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/playbook.html#generate-embeddings-with-text-embedding-nim). 
+
+We'll deploy a Kubernetes Job to our OpenShift cluster to load the embeddings. 
+A job is used rather than a pod to ensure that this process runs only once: 
+
+``` bash
+oc apply -f k8s-job.yaml
+```
+
+> Note: to build a Docker image for the Python application that loads the embeddings 
+> into Weaviate, we executed the following commands: 
+> ``` bash
+> cd workshop/cisco-ai-pods/load-embeddings
+> docker build --platform linux/amd64 -t derekmitchell399/load-embeddings:1.0 .
+> docker push derekmitchell399/load-embeddings:1.0
+> ```
diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/9-deploy-llm-app.md b/content/en/ninja-workshops/14-cisco-ai-pods/9-deploy-llm-app.md
@@ -0,0 +1,11 @@
+---
+title: Deploy the LLM Application
+linkTitle: 9. Deploy the LLM Application
+weight: 9
+time: 10 minutes
+---
+
+Now that our LLM is up and running, we'll add the Prometheus receiver to our
+OpenTelemetry collector to gather metrics from it.
+
+## Capture the NVIDIA DCGM Exporter metrics
diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/images/WeaviateNavigator.png b/content/en/ninja-workshops/14-cisco-ai-pods/images/WeaviateNavigator.png
diff --git a/workshop/cisco-ai-pods/load-embeddings/.gitignore b/workshop/cisco-ai-pods/load-embeddings/.gitignore
@@ -0,0 +1 @@
+.venv/
diff --git a/workshop/cisco-ai-pods/load-embeddings/Dockerfile b/workshop/cisco-ai-pods/load-embeddings/Dockerfile
@@ -0,0 +1,18 @@
+# Use an official Python runtime as a parent image
+FROM python:3.12-slim
+
+# Set working directory
+WORKDIR /app
+
+COPY requirements.txt /app/
+
+# Install dependencies separately
+RUN pip install -r requirements.txt
+
+# Copy the application code
+COPY . /app
+
+# Expose the application on port 8080
+EXPOSE 8080
+
+ENTRYPOINT ["python", "app.py"]
diff --git a/workshop/cisco-ai-pods/load-embeddings/app.py b/workshop/cisco-ai-pods/load-embeddings/app.py
@@ -0,0 +1,39 @@
+import os
+import weaviate
+
+from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
+from langchain_community.document_loaders import PyPDFLoader
+from langchain_text_splitters import CharacterTextSplitter
+from langchain_weaviate import WeaviateVectorStore
+
+# Read environment variables
+DOCUMENT_URL = os.getenv('DOCUMENT_URL') # i.e. https://nvdam.widen.net/content/udc6mzrk7a/original/hpc-datasheet-sc23-h200-datasheet-3002446.pdf
+EMBEDDINGS_MODEL_URL = os.getenv('EMBEDDINGS_MODEL_URL') # i.e. http://localhost:8001/v1
+
+# Load the specified PDF document
+loader = PyPDFLoader(
+    DOCUMENT_URL
+)
+
+documents = loader.load()
+
+text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
+document_chunks = text_splitter.split_documents(documents)
+
+# Initialize and connect to a NeMo Retriever Text Embedding NIM (nvidia/llama-3.2-nv-embedqa-1b-v2)
+embeddings_model = NVIDIAEmbeddings(model="nvidia/llama-3.2-nv-embedqa-1b-v2",
+                                   base_url=EMBEDDINGS_MODEL_URL)
+
+weaviate_client = weaviate.connect_to_custom(
+    # url is:  http://weaviate.weaviate.svc.cluster.local:80
+    http_host=os.getenv('WEAVIATE_HTTP_HOST'),
+    http_port=os.getenv('WEAVIATE_HTTP_PORT'),
+    http_secure=False,
+    grpc_host=os.getenv('WEAVIATE_GRPC_HOST'),
+    grpc_port=os.getenv('WEAVIATE_GRPC_PORT'),
+    grpc_secure=False
+)
+
+db = WeaviateVectorStore.from_documents(document_chunks, embeddings_model, client=weaviate_client)
+
+weaviate_client.close()
diff --git a/workshop/cisco-ai-pods/load-embeddings/k8s-job.yaml b/workshop/cisco-ai-pods/load-embeddings/k8s-job.yaml
@@ -0,0 +1,27 @@
+---
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: load-embeddings
+spec:
+  template:
+    spec:
+      containers:
+        - name: load-embeddings
+          image: "derekmitchell399/load-embeddings:1.0"
+          command: ["python", "app.py"]
+          env:
+            - name: DOCUMENT_URL
+              value: "https://nvdam.widen.net/content/udc6mzrk7a/original/hpc-datasheet-sc23-h200-datasheet-3002446.pdf"
+            - name: EMBEDDINGS_MODEL_URL
+              value: "http://llama-32-nv-embedqa-1b-v2.nim-service:8000/v1"
+            - name: WEAVIATE_HTTP_HOST
+              value: "weaviate.weaviate.svc.cluster.local"
+            - name: WEAVIATE_HTTP_PORT
+              value: "80"
+            - name: WEAVIATE_GRPC_HOST
+              value: "weaviate.weaviate.svc.cluster.local"
+            - name: WEAVIATE_GRPC_PORT
+              value: "50051"
+      restartPolicy: Never # Ensure the job only runs once
+  backoffLimit: 0 # Prevent retries if the Pod fails
diff --git a/workshop/cisco-ai-pods/load-embeddings/requirements.txt b/workshop/cisco-ai-pods/load-embeddings/requirements.txt
@@ -0,0 +1,5 @@
+pypdf==5.4.0
+langchain_community==0.3.22
+langchain-nvidia-ai-endpoints==0.3.7
+langchain-weaviate==0.0.5
+weaviate-client==4.17.0
diff --git a/workshop/cisco-ai-pods/nvidia/nvidia-embeddings.yaml b/workshop/cisco-ai-pods/nvidia/nvidia-embeddings.yaml
@@ -0,0 +1,26 @@
+apiVersion: apps.nvidia.com/v1alpha1
+kind: NIMService
+metadata:
+  name: llama-32-nv-embedqa-1b-v2
+  namespace: nim-service
+spec:
+  image:
+    repository: nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2
+    tag: latest
+    pullPolicy: IfNotPresent
+    pullSecrets:
+      - ngc-secret
+  authSecret: ngc-api-secret
+  storage:
+    pvc:
+      create: true
+      size: "10Gi"
+      volumeAccessMode: "ReadWriteOnce"
+  replicas: 1
+  resources:
+    limits:
+      nvidia.com/gpu: 1
+  expose:
+    service:
+      type: ClusterIP
+      port: 8000
diff --git a/workshop/cisco-ai-pods/otel-collector/otel-collector-values-with-weaviate.yaml b/workshop/cisco-ai-pods/otel-collector/otel-collector-values-with-weaviate.yaml
diff --git a/workshop/cisco-ai-pods/weaviate/weaviate-values.yaml b/workshop/cisco-ai-pods/weaviate/weaviate-values.yaml