Skip to content

Commit d3e2819

Browse files
committed
added sample LLM app to AI PODs workshop
1 parent b6b005f commit d3e2819

File tree

8 files changed

+232
-3
lines changed

8 files changed

+232
-3
lines changed

content/en/ninja-workshops/14-cisco-ai-pods/8-deploy-vector-db.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,7 @@ We'll deploy a Kubernetes Job to our OpenShift cluster to load the embeddings.
195195
A job is used rather than a pod to ensure that this process runs only once:
196196

197197
``` bash
198+
oc create namespace llm-app
198199
oc apply -f k8s-job.yaml
199200
```
200201

content/en/ninja-workshops/14-cisco-ai-pods/9-deploy-llm-app.md

Lines changed: 54 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,58 @@ weight: 9
55
time: 10 minutes
66
---
77

8-
Now that our LLM is up and running, we'll add the Prometheus receiver to our
9-
OpenTelemetry collector to gather metrics from it.
8+
In the final step of the workshop, we'll deploy an application to our Cisco AI POD
9+
that uses the instruct and embeddings models that we deployed earlier using the
10+
NVIDIA NIM operator.
1011

11-
## Capture the NVIDIA DCGM Exporter metrics
12+
## Deploy the LLM Application
13+
14+
Let's deploy an application to our OpenShift cluster that answers questions
15+
using the context that we loaded into the Weaviate vector database earlier.
16+
17+
``` bash
18+
cd workshop/cisco-ai-pods/llm-app
19+
oc apply -f k8s-manifest.yaml
20+
```
21+
22+
> Note: to build a Docker image for this Python application, we executed the following commands:
23+
> ``` bash
24+
> cd workshop/cisco-ai-pods/llm-app
25+
> docker build --platform linux/amd64 -t derekmitchell399/llm-app:1.0 .
26+
> docker push derekmitchell399/llm-app:1.0
27+
> ```
28+
29+
## Test the LLM Application
30+
31+
Let's ensure the application is working as expected.
32+
33+
Start a pod that has access to the curl command:
34+
35+
``` bash
36+
oc run --rm -it -n default curl --image=curlimages/curl:latest -- sh
37+
```
38+
39+
Then run the following command to send a question to the LLM:
40+
41+
{{< tabs >}}
42+
{{% tab title="Script" %}}
43+
44+
``` bash
45+
curl -X "POST" \
46+
'http://llm-app.llm-app:8080/askquestion"' \
47+
-H 'Accept: application/json' \
48+
-H 'Content-Type: application/json' \
49+
-d '{
50+
"question": "How much memory does the NVIDIA H200 have?"
51+
}'
52+
```
53+
54+
{{% /tab %}}
55+
{{% tab title="Example Output" %}}
56+
57+
``` bash
58+
TBD
59+
```
60+
61+
{{% /tab %}}
62+
{{< /tabs >}}
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.venv/
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Use an official Python runtime as a parent image
2+
FROM python:3.12-slim
3+
4+
# Set working directory
5+
WORKDIR /app
6+
7+
COPY requirements.txt /app/
8+
9+
# Install dependencies separately
10+
RUN pip install -r requirements.txt
11+
12+
13+
# Add additional OpenTelemetry instrumentation packages
14+
RUN opentelemetry-bootstrap --action=install
15+
16+
# Copy the application code
17+
COPY . /app
18+
19+
# Expose the application on port 8080
20+
EXPOSE 8080
21+
22+
ENTRYPOINT ["opentelemetry-instrument", "flask", "run", "-p", "8080", "--host", "0.0.0.0"]
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
import os
2+
import weaviate
3+
import openlit
4+
5+
from flask import Flask, request
6+
from langchain_nvidia_ai_endpoints import ChatNVIDIA
7+
from langchain_core.prompts import ChatPromptTemplate
8+
from langchain_core.runnables import RunnablePassthrough
9+
from langchain_core.output_parsers import StrOutputParser
10+
from langchain_weaviate import WeaviateVectorStore
11+
12+
app = Flask(__name__)
13+
14+
openlit.init()
15+
16+
# Read environment variables
17+
INSTRUCT_MODEL_URL = os.getenv('INSTRUCT_MODEL_URL') # i.e. http://localhost:8000/v1
18+
EMBEDDINGS_MODEL_URL = os.getenv('EMBEDDINGS_MODEL_URL') # i.e. http://localhost:8001/v1
19+
20+
# connect to a LLM NIM at the specified endpoint, specifying a specific model
21+
llm = ChatNVIDIA(base_url=INSTRUCT_MODEL_URL, model="meta/llama-3.2-1b-instruct")
22+
23+
# Initialize and connect to a NeMo Retriever Text Embedding NIM (nvidia/llama-3.2-nv-embedqa-1b-v2)
24+
embeddings_model = NVIDIAEmbeddings(model="nvidia/llama-3.2-nv-embedqa-1b-v2",
25+
base_url=EMBEDDINGS_MODEL_URL)
26+
27+
prompt = ChatPromptTemplate.from_messages([
28+
("system",
29+
"You are a helpful and friendly AI!"
30+
"Your responses should be concise and no longer than two sentences."
31+
"Do not hallucinate. Say you don't know if you don't have this information."
32+
# "Answer the question using only the context"
33+
"\n\nQuestion: {question}\n\nContext: {context}"
34+
),
35+
("user", "{question}")
36+
])
37+
38+
@app.route("/askquestion", methods=['POST'])
39+
def ask_question():
40+
41+
data = request.json
42+
question = data.get('question')
43+
44+
weaviate_client = weaviate.connect_to_custom(
45+
# url is: http://weaviate.weaviate.svc.cluster.local:80
46+
http_host=os.getenv('WEAVIATE_HTTP_HOST'),
47+
http_port=os.getenv('WEAVIATE_HTTP_PORT'),
48+
http_secure=False,
49+
grpc_host=os.getenv('WEAVIATE_GRPC_HOST'),
50+
grpc_port=os.getenv('WEAVIATE_GRPC_PORT'),
51+
grpc_secure=False
52+
)
53+
54+
# connect with the vector store that was populated earlier
55+
vector_store = Weaviate(
56+
client=weaviate_client,
57+
embedding=embeddings_model
58+
)
59+
60+
chain = (
61+
{
62+
"context": vector_store.as_retriever(),
63+
"question": RunnablePassthrough()
64+
}
65+
| prompt
66+
| llm
67+
| StrOutputParser()
68+
)
69+
70+
response = chain.invoke(question)
71+
print(response)
72+
73+
weaviate_client.close()
74+
75+
return response
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
---
2+
apiVersion: apps/v1
3+
kind: Deployment
4+
metadata:
5+
name: llm-app
6+
namespace: llm-app
7+
spec:
8+
replicas: 1
9+
selector:
10+
matchLabels:
11+
app.kubernetes.io/name: llm-app
12+
app.kubernetes.io/instance: llm-app
13+
template:
14+
metadata:
15+
labels:
16+
app.kubernetes.io/name: llm-app
17+
app.kubernetes.io/instance: llm-app
18+
spec:
19+
containers:
20+
- name: llm-app
21+
image: "derekmitchell399/llm-app:1.0"
22+
imagePullPolicy: Always
23+
ports:
24+
- name: http
25+
containerPort: 8080
26+
env:
27+
- name: OTEL_SERVICE_NAME
28+
value: "llm-app"
29+
- name: OTEL_RESOURCE_ATTRIBUTES
30+
value: "deployment.environment=llm-app"
31+
- name: SPLUNK_OTEL_AGENT
32+
valueFrom:
33+
fieldRef:
34+
fieldPath: status.hostIP
35+
- name: OTEL_EXPORTER_OTLP_ENDPOINT
36+
value: "http://$(SPLUNK_OTEL_AGENT):4317"
37+
- name: OTEL_EXPORTER_OTLP_PROTOCOL
38+
value: "grpc"
39+
# filter out health check requests to the root URL
40+
- name: OTEL_PYTHON_EXCLUDED_URLS
41+
value: "^(https?://)?[^/]+(/)?$"
42+
- name: SPLUNK_PROFILER_ENABLED
43+
value: "true"
44+
- name: INSTRUCT_MODEL_URL
45+
value: "http://meta-llama-3-2-1b-instruct.nim-service:8000/v1"
46+
- name: EMBEDDINGS_MODEL_URL
47+
value: "http://llama-32-nv-embedqa-1b-v2.nim-service:8000/v1"
48+
- name: WEAVIATE_HTTP_HOST
49+
value: "weaviate.weaviate.svc.cluster.local"
50+
- name: WEAVIATE_HTTP_PORT
51+
value: "80"
52+
- name: WEAVIATE_GRPC_HOST
53+
value: "weaviate.weaviate.svc.cluster.local"
54+
- name: WEAVIATE_GRPC_PORT
55+
value: "50051"
56+
resources: {}
57+
---
58+
apiVersion: v1
59+
kind: Service
60+
metadata:
61+
name: llm-app
62+
namespace: llm-app
63+
spec:
64+
type: NodePort
65+
ports:
66+
- protocol: TCP
67+
port: 8080
68+
targetPort: 8080
69+
selector:
70+
app.kubernetes.io/name: llm-app
71+
app.kubernetes.io/instance: llm-app
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Flask==3.0.3
2+
langchain_community==0.3.22
3+
langchain-nvidia-ai-endpoints==0.3.7
4+
langchain-weaviate==0.0.5
5+
weaviate-client==4.17.0
6+
splunk-opentelemetry==2.7.0
7+
openlit==1.35.4

workshop/cisco-ai-pods/load-embeddings/k8s-job.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ apiVersion: batch/v1
33
kind: Job
44
metadata:
55
name: load-embeddings
6+
namespace: llm-app
67
spec:
78
template:
89
spec:

0 commit comments

Comments
 (0)