Skip to content

Commit 84378ef

Browse files
committed
updated AI POD workshop instructions
1 parent ddcc97a commit 84378ef

File tree

7 files changed

+178
-38
lines changed

7 files changed

+178
-38
lines changed

content/en/ninja-workshops/14-cisco-ai-pods/10-cleanup.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,27 @@
11
---
2-
title: Cleanup
3-
linkTitle: 10. Cleanup
2+
title: Wrap-Up
3+
linkTitle: 10. Wrap-Up
44
weight: 10
55
time: 5 minutes
66
---
77

8+
## Wrap-Up
9+
10+
We hope you enjoyed this workshop, which provided hands-on experience deploying and working
11+
with several of the technologies that are used to monitor Cisco AI PODs with
12+
Splunk Observability Cloud. Specifically, you had the opportunity to:
13+
14+
* Deploy a RedHat OpenShift cluster with GPU-based worker nodes.
15+
* Deploy the NVIDIA NIM Operator and NVIDIA GPU Operator.
16+
* Deploy Large Language Models (LLMs) using NVIDIA NIM to the cluster.
17+
* Deploy the OpenTelemetry Collector in the Red Hat OpenShift cluster.
18+
* Add Prometheus receivers to the collector to ingest infrastructure metrics.
19+
* Deploy the Weaviate vector database to the cluster.
20+
* Instrument Python services that interact with Large Language Models (LLMs) with OpenTelemetry.
21+
* Understand which details which OpenTelemetry captures in the trace from applications that interact with LLMs.
22+
23+
## Clean Up Steps
24+
825
Follow the steps in this section to uninstall the OpenShift cluster.
926

1027
Get the cluster ID, the Amazon Resource Names (ARNs) for the cluster-specific Operator roles,

content/en/ninja-workshops/14-cisco-ai-pods/4-deploy-otel-collector.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ time: 10 minutes
88
Now that our OpenShift cluster is up and running, let's deploy the
99
OpenTelemetry Collector, which gathers metrics, logs, and traces
1010
from the infrastructure and applications running in the cluster, and
11-
sends the resulting data to Splunk.
11+
sends the resulting data to Splunk Observability Cloud.
1212

1313
## Deploy the OpenTelemetry Collector
1414

content/en/ninja-workshops/14-cisco-ai-pods/5-deploy-nvidia-nim.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,14 @@ weight: 5
55
time: 20 minutes
66
---
77

8-
The NVIDIA NIM Operator is used to deploy LLMs in Kubernetes environments, such
8+
The **NVIDIA GPU Operator** is a Kubernetes Operator that automates the deployment, configuration,
9+
and management of all necessary NVIDIA software components to provision GPUs within a Kubernetes cluster.
10+
11+
The **NVIDIA NIM Operator** is used to deploy LLMs in Kubernetes environments, such
912
as the OpenShift cluster we created earlier in this workshop.
1013

11-
This section of the workshop walks through the steps necessary to deploy the
12-
NVIDIA NIM operator in our OpenShift cluster.
14+
This section of the workshop walks through the steps necessary to deploy both the
15+
NVIDIA GPU and NIM operators in our OpenShift cluster.
1316

1417
## Create a NVIDIA NGC Account
1518

content/en/ninja-workshops/14-cisco-ai-pods/6-deploy-llm.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ weight: 6
55
time: 20 minutes
66
---
77

8-
In this section, we'll use the NVIDIA NIM Operator to deploy a Large Language Model
8+
In this section, we'll use the NVIDIA NIM Operator to deploy two Large Language Models
99
to our OpenShift Cluster.
1010

1111
## Create a Namespace

content/en/ninja-workshops/14-cisco-ai-pods/8-deploy-vector-db.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,20 +5,20 @@ weight: 8
55
time: 10 minutes
66
---
77

8-
In this step, we'll deploy a vector database to the AI POD and populate it with
8+
In this step, we'll deploy a vector database to the OpenShift cluster and populate it with
99
test data.
1010

1111
## What is a Vector Database?
1212

13-
A vector database stores and indexes data as numerical "vector embeddings," which capture
14-
the semantic meaning of information like text or images. Unlike traditional databases,
15-
they excel at "similarity searches," finding conceptually related data points rather
13+
A **vector database** stores and indexes data as numerical "vector embeddings," which capture
14+
the **semantic meaning** of information like text or images. Unlike traditional databases,
15+
they excel at **similarity searches**, finding conceptually related data points rather
1616
than exact matches.
1717

1818
## How is a Vector Database Used?
1919

2020
Vector databases play a key role in a pattern called
21-
Retrieval Augmented Generation (RAG), which is widely used by
21+
**Retrieval Augmented Generation (RAG)**, which is widely used by
2222
applications that leverage Large Language Models (LLMs).
2323

2424
The pattern is as follows:
@@ -63,7 +63,7 @@ oc create namespace weaviate
6363

6464
Run the following command to allow Weaviate to run a privileged container:
6565

66-
> Note: this approach is not recommended for production
66+
> Note: this approach is not recommended for production environments
6767
6868
``` bash
6969
oc adm policy add-scc-to-user privileged -z default -n weaviate

content/en/ninja-workshops/14-cisco-ai-pods/9-deploy-llm-app.md

Lines changed: 125 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,136 @@ weight: 9
55
time: 10 minutes
66
---
77

8-
In the final step of the workshop, we'll deploy an application to our Cisco AI POD
8+
In the final step of the workshop, we'll deploy an application to our OpenShift cluster
99
that uses the instruct and embeddings models that we deployed earlier using the
1010
NVIDIA NIM operator.
1111

12+
## Application Overview
13+
14+
Like most applications that interact with LLMs, our application is written in Python.
15+
It also uses [LangChain](https://www.langchain.com/), which is an open-source orchestration
16+
framework that simplifies the development of applications powered by LLMs.
17+
18+
Our application starts by connecting to two LLMs that we'll be using:
19+
20+
* `meta/llama-3.2-1b-instruct`: used for responding to user prompts
21+
* `nvidia/llama-3.2-nv-embedqa-1b-v2`: used to calculate embeddings
22+
23+
``` python
24+
# connect to a LLM NIM at the specified endpoint, specifying a specific model
25+
llm = ChatNVIDIA(base_url=INSTRUCT_MODEL_URL, model="meta/llama-3.2-1b-instruct")
26+
27+
# Initialize and connect to a NeMo Retriever Text Embedding NIM (nvidia/llama-3.2-nv-embedqa-1b-v2)
28+
embeddings_model = NVIDIAEmbeddings(model="nvidia/llama-3.2-nv-embedqa-1b-v2",
29+
base_url=EMBEDDINGS_MODEL_URL)
30+
```
31+
32+
The URL's used for both LLMs are defined in the `k8s-manifest.yaml` file:
33+
34+
``` yaml
35+
- name: INSTRUCT_MODEL_URL
36+
value: "http://meta-llama-3-2-1b-instruct.nim-service:8000/v1"
37+
- name: EMBEDDINGS_MODEL_URL
38+
value: "http://llama-32-nv-embedqa-1b-v2.nim-service:8000/v1"
39+
```
40+
41+
The application then defines a prompt template that will be used in interactions
42+
with the LLM:
43+
44+
``` python
45+
prompt = ChatPromptTemplate.from_messages([
46+
("system",
47+
"You are a helpful and friendly AI!"
48+
"Your responses should be concise and no longer than two sentences."
49+
"Do not hallucinate. Say you don't know if you don't have this information."
50+
"Answer the question using only the context"
51+
"\n\nQuestion: {question}\n\nContext: {context}"
52+
),
53+
("user", "{question}")
54+
])
55+
```
56+
57+
> Note how we're explicitly instructing the LLM to just say it doesn't know the answer if
58+
> it doesn't know, which helps minimize hallucinations. There's also a placeholder for
59+
> us to provide context that the LLM can use to answer the question.
60+
61+
The application uses Flask, and defines a single endpoint named `/askquestion` to
62+
respond to questions from end users. To implement this endpoint, the application
63+
connects to the Weaviate vector database, and then invokes a chain (using LangChain)
64+
that takes the user's question, converts it to an embedding, and then looks up similar
65+
documents in the vector database. It then sends the user's question to the LLM, along
66+
with the related documents, and returns the LLM's response.
67+
68+
``` python
69+
# connect with the vector store that was populated earlier
70+
vector_store = WeaviateVectorStore(
71+
client=weaviate_client,
72+
embedding=embeddings_model,
73+
index_name="CustomDocs",
74+
text_key="page_content"
75+
)
76+
77+
chain = (
78+
{
79+
"context": vector_store.as_retriever(),
80+
"question": RunnablePassthrough()
81+
}
82+
| prompt
83+
| llm
84+
| StrOutputParser()
85+
)
86+
87+
response = chain.invoke(question)
88+
```
89+
90+
## Instrument the Application with OpenTelemetry
91+
92+
To capture metrics, traces, and logs from our application, we've instrumented it with OpenTelemetry.
93+
This required adding the following package to the `requirements.txt` file (which ultimately gets
94+
installed with `pip install`):
95+
96+
````
97+
splunk-opentelemetry==2.7.0
98+
````
99+
100+
We also added the following to the `Dockerfile` used to build the
101+
container image for this application, to install additional OpenTelemetry
102+
instrumentation packages:
103+
104+
``` dockerfile
105+
# Add additional OpenTelemetry instrumentation packages
106+
RUN opentelemetry-bootstrap --action=install
107+
```
108+
109+
Then we modified the `ENTRYPOINT` in the `Dockerfile` to call `opentelemetry-instrument`
110+
when running the application:
111+
112+
``` dockerfile
113+
ENTRYPOINT ["opentelemetry-instrument", "flask", "run", "-p", "8080", "--host", "0.0.0.0"]
114+
```
115+
116+
Finally, to enhance the traces and metrics collected with OpenTelemetry, we added a
117+
package named [OpenLIT](https://openlit.io/) to the `requirements.txt` file:
118+
119+
````
120+
openlit==1.35.4
121+
````
122+
123+
OpenLIT supports LangChain, and adds additional context to traces at instrumentation time,
124+
such as the number of tokens used to process the request, and what the prompt and
125+
response were.
126+
127+
To initialize OpenLIT, we added the following to the application code:
128+
129+
``` python
130+
import openlit
131+
...
132+
openlit.init(environment="llm-app")
133+
```
134+
12135
## Deploy the LLM Application
13136

14-
Let's deploy an application to our OpenShift cluster that answers questions
15-
using the context that we loaded into the Weaviate vector database earlier.
137+
Use the following command to deploy this application to the OpenShift cluster:
16138

17139
``` bash
18140
oc apply -f ./llm-app/k8s-manifest.yaml
@@ -90,18 +212,3 @@ Finally, we can see the response from the LLM, the time it took, and the number
90212
input and output tokens utilized:
91213
92214
![LLM Response](../images/LLMResponse.png)
93-
94-
## Wrap-Up
95-
96-
We hope you enjoyed this workshop, which provided hands-on experience deploying and working
97-
with several of the technologies that are used to monitor Cisco AI PODs with
98-
Splunk Observability Cloud. Specifically, you had the opportunity to:
99-
100-
* Deploy a RedHat OpenShift cluster with GPU-based worker nodes.
101-
* Deploy the NVIDIA NIM Operator and NVIDIA GPU Operator.
102-
* Deploy Large Language Models (LLMs) using NVIDIA NIM to the cluster.
103-
* Deploy the OpenTelemetry Collector in the Red Hat OpenShift cluster.
104-
* Add Prometheus receivers to the collector to ingest infrastructure metrics.
105-
* Deploy the Weaviate vector database to the cluster.
106-
* Instrument Python services that interact with Large Language Models (LLMs) with OpenTelemetry.
107-
* Understand which details which OpenTelemetry captures in the trace from applications that interact with LLMs.

content/en/ninja-workshops/14-cisco-ai-pods/_index.md

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,16 +16,29 @@ scalable, and efficient AI-ready infrastructure tailored to diverse needs.
1616
**Splunk Observability Cloud** provides comprehensive visibility into all of this infrastructure
1717
along with all the application components that are running on this stack.
1818

19+
The steps to configure Splunk Observability Cloud for a Cisco AI POD environment are fully
20+
documented (see [here](https://github.com/signalfx/splunk-opentelemetry-examples/tree/main/collector/cisco-ai-ready-pods)
21+
for details).
22+
23+
However, it's not always possible to get access to a Cisco AI POD environment to practice
24+
the installation steps.
25+
1926
This workshop provides hands-on experience deploying and working with several of the technologies
20-
that are used to monitor Cisco AI PODs with Splunk Observability Cloud, including:
27+
that are used to monitor Cisco AI PODs with Splunk Observability Cloud, without requiring
28+
access to an actual Cisco AI POD. This includes:
2129

22-
* Practice deploying an OpenTelemetry Collector in a Red Hat OpenShift cluster.
23-
* Practice configuring Prometheus receivers with the collector to ingest infrastructure metrics.
24-
* Practice instrumenting Python services that interact with Large Language Models (LLMs) with OpenTelemetry.
30+
* Practice deploying a **RedHat OpenShift** cluster with GPU-based worker nodes.
31+
* Practice deploying the **NVIDIA NIM Operator** and **NVIDIA GPU Operator**.
32+
* Practice deploying **Large Language Models (LLMs)** using NVIDIA NIM to the cluster.
33+
* Practice deploying the **OpenTelemetry Collector** in the Red Hat OpenShift cluster.
34+
* Practice adding **Prometheus** receivers to the collector to ingest infrastructure metrics.
35+
* Practice deploying the **Weaviate** vector database to the cluster.
36+
* Practice instrumenting Python services that interact with Large Language Models (LLMs) with **OpenTelemetry**.
37+
* Understanding which details which OpenTelemetry captures in the trace from applications that interact with LLMs.
2538

26-
While access to an actual Cisco AI POD isn't required, the workshop **does** require access
27-
to an AWS account. We'll walk you through the steps of creating a Red Hat OpenShift
28-
cluster in AWS that we'll use for the rest of the workshop.
39+
> Please note: Red Hat OpenShift and NVIDIA AI Enterprise components
40+
> are typically pre-installed with an actual AI POD. However, because we’re using AWS for this workshop,
41+
> it’s necessary to perform these setup steps manually.
2942
3043
{{% notice title="Tip" style="primary" icon="lightbulb" %}}
3144
The easiest way to navigate through this workshop is by using:

0 commit comments

Comments
 (0)