Skip to content

Commit f11984e

Browse files
committed
added vector database to AI Pod workshop
1 parent 1e5b122 commit f11984e

File tree

12 files changed

+2854
-0
lines changed

12 files changed

+2854
-0
lines changed

content/en/ninja-workshops/14-cisco-ai-pods/6-deploy-llm.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,5 +177,53 @@ curl -X "POST" \
177177
}
178178
```
179179

180+
{{% /tab %}}
181+
{{< /tabs >}}
182+
183+
## Deploy an Embeddings Model
184+
185+
We're also going to deploy an [embeddings model](https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2/deploy)
186+
in our cluster, which will be used later in the workshop to implement Retrieval Augmented Generation (RAG).
187+
188+
Run the following command to deploy the embeddings model:
189+
190+
``` bash
191+
oc apply -n nim-service -f nvidia-embeddings.yaml
192+
```
193+
194+
## Test the Embeddings Model
195+
196+
Let's ensure the embeddings is working as expected.
197+
198+
Start a pod that has access to the curl command:
199+
200+
``` bash
201+
oc run --rm -it -n default curl --image=curlimages/curl:latest -- sh
202+
```
203+
204+
Then run the following command to send a prompt to the LLM:
205+
206+
{{< tabs >}}
207+
{{% tab title="Script" %}}
208+
209+
``` bash
210+
curl -X "POST" \
211+
'http://llama-32-nv-embedqa-1b-v2.nim-service:8000/v1/embeddings"' \
212+
-H 'Accept: application/json' \
213+
-H 'Content-Type: application/json' \
214+
-d '{
215+
"model": "nvidia/llama-3.2-nv-embedqa-1b-v2",
216+
"input": ["Hello world"],
217+
"input_type": "query"
218+
}'
219+
```
220+
221+
{{% /tab %}}
222+
{{% tab title="Example Output" %}}
223+
224+
``` bash
225+
TBD
226+
```
227+
180228
{{% /tab %}}
181229
{{< /tabs >}}
Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
---
2+
title: Deploy the Vector Database
3+
linkTitle: 8. Deploy the Vector Database
4+
weight: 8
5+
time: 10 minutes
6+
---
7+
8+
In this step, we'll deploy a vector database to the AI POD and populate it with
9+
test data.
10+
11+
## What is a Vector Database?
12+
13+
A vector database stores and indexes data as numerical "vector embeddings," which capture
14+
the semantic meaning of information like text or images. Unlike traditional databases,
15+
they excel at "similarity searches," finding conceptually related data points rather
16+
than exact matches.
17+
18+
## How is a Vector Database Used?
19+
20+
Vector databases play a key role in a pattern called
21+
Retrieval Augmented Generation (RAG), which is widely used by
22+
applications that leverage Large Language Models (LLMs).
23+
24+
The pattern is as follows:
25+
26+
* The end-user asks a question to the application
27+
* The application takes the question and calculates a vector embedding for it
28+
* The app then performs a similarity search, looking for related documents in the vector database
29+
* The app then takes the original question and the related documents, and sends it to the LLM as context
30+
* The LLM reviews the context and returns a response to the application
31+
32+
## Deploy a Vector Database
33+
34+
For the workshop, we'll deploy an open-source vector database named
35+
[Weaviate](https://weaviate.io/).
36+
37+
First, add the Weaviate helm repo that contains the Weaviate helm chart:
38+
39+
``` bash
40+
helm repo add weaviate https://weaviate.github.io/weaviate-helm
41+
helm repo update
42+
```
43+
44+
The `weaviate/weaviate-values.yaml` file includes the configuration we'll use to deploy
45+
the Weviate vector database.
46+
47+
We've set the following environment variables to `TRUE`, to ensure Weaviate exposes
48+
metrics that we can scrape later with the Prometheus receiver:
49+
50+
````
51+
PROMETHEUS_MONITORING_ENABLED: true
52+
PROMETHEUS_MONITORING_GROUP: true
53+
````
54+
55+
Review [Weaviate documentation](https://docs.weaviate.io/deploy/installation-guides/k8s-installation)
56+
to explore additional customization options available.
57+
58+
Let's create a new namespace:
59+
60+
``` bash
61+
oc create namespace weaviate
62+
```
63+
64+
Then deploy Weaviate:
65+
66+
``` bash
67+
helm upgrade --install \
68+
"weaviate" \
69+
weaviate/weaviate \
70+
--namespace "weaviate" \
71+
--values ./weaviate/weaviate-values.yaml
72+
```
73+
74+
## Capture Weaviate Metrics with Prometheus
75+
76+
Now that Weaviate is installed in our OpenShift cluster, let's modify the
77+
OpenTelemetry collector configuration to scrape Weaviate's Prometheus
78+
metrics.
79+
80+
To do so, let's add an additional Prometheus receiver to the `otel-collector-values.yaml` file:
81+
82+
``` yaml
83+
prometheus/weaviate:
84+
config:
85+
config:
86+
scrape_configs:
87+
- job_name: weaviate-metrics
88+
scrape_interval: 10s
89+
static_configs:
90+
- targets:
91+
- '`endpoint`:2112'
92+
rule: type == "pod" && labels["app"] == "weaviate"
93+
```
94+
95+
We'll need to ensure that Weaviate's metrics are added to the `filter/metrics_to_be_included` filter
96+
processor configuration as well:
97+
98+
``` yaml
99+
processors:
100+
filter/metrics_to_be_included:
101+
metrics:
102+
# Include only metrics used in charts and detectors
103+
include:
104+
match_type: strict
105+
metric_names:
106+
- DCGM_FI_DEV_FB_FREE
107+
- ...
108+
- object_count
109+
- vector_index_size
110+
- vector_index_operations
111+
- vector_index_tombstones
112+
- vector_index_tombstone_cleanup_threads
113+
- vector_index_tombstone_cleanup_threads
114+
- requests_total
115+
- objects_durations_ms_sum
116+
- objects_durations_ms_count
117+
- batch_delete_durations_ms_sum
118+
- batch_delete_durations_ms_count
119+
```
120+
121+
We also want to add a Resource processor to the configuration file, with the following configuration:
122+
123+
``` yaml
124+
resource/weaviate:
125+
attributes:
126+
- key: weaviate.instance.id
127+
from_attribute: service.instance.id
128+
action: insert
129+
```
130+
131+
This processor takes the `service.instance.id` property on the Weaviate metrics
132+
and copies it into a new property called `weaviate.instance.id`. This is done so
133+
that we can more easily distinguish Weaviate metrics from other metrics that use
134+
`service.instance.id`, which is a standard OpenTelemetry property used in
135+
Splunk Observability Cloud.
136+
137+
We'll need to add this Resource processor to the metrics pipeline as well:
138+
139+
``` yaml
140+
service:
141+
pipelines:
142+
metrics/nvidia-metrics:
143+
exporters:
144+
- signalfx
145+
processors:
146+
- memory_limiter
147+
- filter/metrics_to_be_included
148+
- resource/weaviate
149+
- batch
150+
- resourcedetection
151+
- resource
152+
receivers:
153+
- receiver_creator/nvidia
154+
```
155+
156+
Before applying the configuration changes to the collector, take a moment to compare the
157+
contents of your modified `otel-collector-values.yaml` file with the
158+
`otel-collector-values-with-weaviate.yaml` file.
159+
Update your file as needed to ensure the contents match. Remember that indentation is important
160+
for `yaml` files, and needs to be precise.
161+
162+
Now we can update the OpenTelemetry collector configuration by running the
163+
following Helm command:
164+
165+
``` bash
166+
helm upgrade splunk-otel-collector \
167+
--set="clusterName=$CLUSTER_NAME" \
168+
--set="environment=$ENVIRONMENT_NAME" \
169+
--set="splunkObservability.accessToken=$SPLUNK_ACCESS_TOKEN" \
170+
--set="splunkObservability.realm=$SPLUNK_REALM" \
171+
--set="splunkPlatform.endpoint=$SPLUNK_HEC_URL" \
172+
--set="splunkPlatform.token=$SPLUNK_HEC_TOKEN" \
173+
--set="splunkPlatform.index=$SPLUNK_INDEX" \
174+
-f ./otel-collector/otel-collector-values.yaml \
175+
-n otel \
176+
splunk-otel-collector-chart/splunk-otel-collector
177+
```
178+
179+
In Splunk Observability Cloud, navigate to `Infrastructure` -> `AI Frameworks` -> `Weaviate`.
180+
Filter on the `k8s.cluster.name` of interest, and ensure the navigator is populated as in the
181+
following example:
182+
183+
![Kubernetes Pods](../images/WeaviateNavigator.png)
184+
185+
## Populate the Vector Database
186+
187+
Now that Weaviate is up and running, and we're capturing metrics from it
188+
to ensure it's healthy, let's add some data to it that we'll use in the next part
189+
of the workshop with a custom application.
190+
191+
The application used to do this is based on
192+
[LangChain Playbook for NeMo Retriever Text Embedding NIM](https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/playbook.html#generate-embeddings-with-text-embedding-nim).
193+
194+
We'll deploy a Kubernetes Job to our OpenShift cluster to load the embeddings.
195+
A job is used rather than a pod to ensure that this process runs only once:
196+
197+
``` bash
198+
oc apply -f k8s-job.yaml
199+
```
200+
201+
> Note: to build a Docker image for the Python application that loads the embeddings
202+
> into Weaviate, we executed the following commands:
203+
> ``` bash
204+
> cd workshop/cisco-ai-pods/load-embeddings
205+
> docker build --platform linux/amd64 -t derekmitchell399/load-embeddings:1.0 .
206+
> docker push derekmitchell399/load-embeddings:1.0
207+
> ```
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
title: Deploy the LLM Application
3+
linkTitle: 9. Deploy the LLM Application
4+
weight: 9
5+
time: 10 minutes
6+
---
7+
8+
Now that our LLM is up and running, we'll add the Prometheus receiver to our
9+
OpenTelemetry collector to gather metrics from it.
10+
11+
## Capture the NVIDIA DCGM Exporter metrics
234 KB
Loading
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.venv/
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Use an official Python runtime as a parent image
2+
FROM python:3.12-slim
3+
4+
# Set working directory
5+
WORKDIR /app
6+
7+
COPY requirements.txt /app/
8+
9+
# Install dependencies separately
10+
RUN pip install -r requirements.txt
11+
12+
# Copy the application code
13+
COPY . /app
14+
15+
# Expose the application on port 8080
16+
EXPOSE 8080
17+
18+
ENTRYPOINT ["python", "app.py"]
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
import os
2+
import weaviate
3+
4+
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
5+
from langchain_community.document_loaders import PyPDFLoader
6+
from langchain_text_splitters import CharacterTextSplitter
7+
from langchain_weaviate import WeaviateVectorStore
8+
9+
# Read environment variables
10+
DOCUMENT_URL = os.getenv('DOCUMENT_URL') # i.e. https://nvdam.widen.net/content/udc6mzrk7a/original/hpc-datasheet-sc23-h200-datasheet-3002446.pdf
11+
EMBEDDINGS_MODEL_URL = os.getenv('EMBEDDINGS_MODEL_URL') # i.e. http://localhost:8001/v1
12+
13+
# Load the specified PDF document
14+
loader = PyPDFLoader(
15+
DOCUMENT_URL
16+
)
17+
18+
documents = loader.load()
19+
20+
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
21+
document_chunks = text_splitter.split_documents(documents)
22+
23+
# Initialize and connect to a NeMo Retriever Text Embedding NIM (nvidia/llama-3.2-nv-embedqa-1b-v2)
24+
embeddings_model = NVIDIAEmbeddings(model="nvidia/llama-3.2-nv-embedqa-1b-v2",
25+
base_url=EMBEDDINGS_MODEL_URL)
26+
27+
weaviate_client = weaviate.connect_to_custom(
28+
# url is: http://weaviate.weaviate.svc.cluster.local:80
29+
http_host=os.getenv('WEAVIATE_HTTP_HOST'),
30+
http_port=os.getenv('WEAVIATE_HTTP_PORT'),
31+
http_secure=False,
32+
grpc_host=os.getenv('WEAVIATE_GRPC_HOST'),
33+
grpc_port=os.getenv('WEAVIATE_GRPC_PORT'),
34+
grpc_secure=False
35+
)
36+
37+
db = WeaviateVectorStore.from_documents(document_chunks, embeddings_model, client=weaviate_client)
38+
39+
weaviate_client.close()
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
apiVersion: batch/v1
3+
kind: Job
4+
metadata:
5+
name: load-embeddings
6+
spec:
7+
template:
8+
spec:
9+
containers:
10+
- name: load-embeddings
11+
image: "derekmitchell399/load-embeddings:1.0"
12+
command: ["python", "app.py"]
13+
env:
14+
- name: DOCUMENT_URL
15+
value: "https://nvdam.widen.net/content/udc6mzrk7a/original/hpc-datasheet-sc23-h200-datasheet-3002446.pdf"
16+
- name: EMBEDDINGS_MODEL_URL
17+
value: "http://llama-32-nv-embedqa-1b-v2.nim-service:8000/v1"
18+
- name: WEAVIATE_HTTP_HOST
19+
value: "weaviate.weaviate.svc.cluster.local"
20+
- name: WEAVIATE_HTTP_PORT
21+
value: "80"
22+
- name: WEAVIATE_GRPC_HOST
23+
value: "weaviate.weaviate.svc.cluster.local"
24+
- name: WEAVIATE_GRPC_PORT
25+
value: "50051"
26+
restartPolicy: Never # Ensure the job only runs once
27+
backoffLimit: 0 # Prevent retries if the Pod fails
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pypdf==5.4.0
2+
langchain_community==0.3.22
3+
langchain-nvidia-ai-endpoints==0.3.7
4+
langchain-weaviate==0.0.5
5+
weaviate-client==4.17.0
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
apiVersion: apps.nvidia.com/v1alpha1
2+
kind: NIMService
3+
metadata:
4+
name: llama-32-nv-embedqa-1b-v2
5+
namespace: nim-service
6+
spec:
7+
image:
8+
repository: nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2
9+
tag: latest
10+
pullPolicy: IfNotPresent
11+
pullSecrets:
12+
- ngc-secret
13+
authSecret: ngc-api-secret
14+
storage:
15+
pvc:
16+
create: true
17+
size: "10Gi"
18+
volumeAccessMode: "ReadWriteOnce"
19+
replicas: 1
20+
resources:
21+
limits:
22+
nvidia.com/gpu: 1
23+
expose:
24+
service:
25+
type: ClusterIP
26+
port: 8000

0 commit comments

Comments
 (0)