updated AI POD workshop instructions

dmitchsplunk · dmitchsplunk · commit 84378ef5be45 · 2025-10-03T11:44:56.000-07:00
diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/10-cleanup.md b/content/en/ninja-workshops/14-cisco-ai-pods/10-cleanup.md
@@ -1,10 +1,27 @@
 ---
-title: Cleanup
-linkTitle: 10. Cleanup
+title: Wrap-Up
+linkTitle: 10. Wrap-Up
 weight: 10
 time: 5 minutes
 ---
 
+## Wrap-Up
+
+We hope you enjoyed this workshop, which provided hands-on experience deploying and working
+with several of the technologies that are used to monitor Cisco AI PODs with
+Splunk Observability Cloud. Specifically, you had the opportunity to:
+
+* Deploy a RedHat OpenShift cluster with GPU-based worker nodes.
+* Deploy the NVIDIA NIM Operator and NVIDIA GPU Operator.
+* Deploy Large Language Models (LLMs) using NVIDIA NIM to the cluster.
+* Deploy the OpenTelemetry Collector in the Red Hat OpenShift cluster.
+* Add Prometheus receivers to the collector to ingest infrastructure metrics.
+* Deploy the Weaviate vector database to the cluster.
+* Instrument Python services that interact with Large Language Models (LLMs) with OpenTelemetry.
+* Understand which details which OpenTelemetry captures in the trace from applications that interact with LLMs.
+
+## Clean Up Steps
+
 Follow the steps in this section to uninstall the OpenShift cluster. 
 
 Get the cluster ID, the Amazon Resource Names (ARNs) for the cluster-specific Operator roles, 
diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/4-deploy-otel-collector.md b/content/en/ninja-workshops/14-cisco-ai-pods/4-deploy-otel-collector.md
@@ -8,7 +8,7 @@ time: 10 minutes
 Now that our OpenShift cluster is up and running, let's deploy the
 OpenTelemetry Collector, which gathers metrics, logs, and traces 
 from the infrastructure and applications running in the cluster, and 
-sends the resulting data to Splunk. 
+sends the resulting data to Splunk Observability Cloud. 
 
 ## Deploy the OpenTelemetry Collector
 
diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/5-deploy-nvidia-nim.md b/content/en/ninja-workshops/14-cisco-ai-pods/5-deploy-nvidia-nim.md
@@ -5,11 +5,14 @@ weight: 5
 time: 20 minutes
 ---
 
-The NVIDIA NIM Operator is used to deploy LLMs in Kubernetes environments, such 
+The **NVIDIA GPU Operator** is a Kubernetes Operator that automates the deployment, configuration,
+and management of all necessary NVIDIA software components to provision GPUs within a Kubernetes cluster.
+
+The **NVIDIA NIM Operator** is used to deploy LLMs in Kubernetes environments, such 
 as the OpenShift cluster we created earlier in this workshop. 
 
-This section of the workshop walks through the steps necessary to deploy the 
-NVIDIA NIM operator in our OpenShift cluster. 
+This section of the workshop walks through the steps necessary to deploy both the 
+NVIDIA GPU and NIM operators in our OpenShift cluster. 
 
 ## Create a NVIDIA NGC Account
 
diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/6-deploy-llm.md b/content/en/ninja-workshops/14-cisco-ai-pods/6-deploy-llm.md
@@ -5,7 +5,7 @@ weight: 6
 time: 20 minutes
 ---
 
-In this section, we'll use the NVIDIA NIM Operator to deploy a Large Language Model 
+In this section, we'll use the NVIDIA NIM Operator to deploy two Large Language Models 
 to our OpenShift Cluster. 
 
 ## Create a Namespace
diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/8-deploy-vector-db.md b/content/en/ninja-workshops/14-cisco-ai-pods/8-deploy-vector-db.md
@@ -5,20 +5,20 @@ weight: 8
 time: 10 minutes
 ---
 
-In this step, we'll deploy a vector database to the AI POD and populate it with 
+In this step, we'll deploy a vector database to the OpenShift cluster and populate it with 
 test data. 
 
 ## What is a Vector Database? 
 
-A vector database stores and indexes data as numerical "vector embeddings," which capture
-the semantic meaning of information like text or images. Unlike traditional databases,
-they excel at "similarity searches," finding conceptually related data points rather
+A **vector database** stores and indexes data as numerical "vector embeddings," which capture
+the **semantic meaning** of information like text or images. Unlike traditional databases,
+they excel at **similarity searches**, finding conceptually related data points rather
 than exact matches.
 
 ## How is a Vector Database Used? 
 
 Vector databases play a key role in a pattern called
-Retrieval Augmented Generation (RAG), which is widely used by 
+**Retrieval Augmented Generation (RAG)**, which is widely used by 
 applications that leverage Large Language Models (LLMs). 
 
 The pattern is as follows: 
@@ -63,7 +63,7 @@ oc create namespace weaviate
 
 Run the following command to allow Weaviate to run a privileged container:
 
-> Note: this approach is not recommended for production 
+> Note: this approach is not recommended for production environments 
 
 ``` bash
 oc adm policy add-scc-to-user privileged -z default -n weaviate
diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/9-deploy-llm-app.md b/content/en/ninja-workshops/14-cisco-ai-pods/9-deploy-llm-app.md
@@ -5,14 +5,136 @@ weight: 9
 time: 10 minutes
 ---
 
-In the final step of the workshop, we'll deploy an application to our Cisco AI POD 
+In the final step of the workshop, we'll deploy an application to our OpenShift cluster 
 that uses the instruct and embeddings models that we deployed earlier using the 
 NVIDIA NIM operator. 
 
+## Application Overview 
+
+Like most applications that interact with LLMs, our application is written in Python. 
+It also uses [LangChain](https://www.langchain.com/), which is an open-source orchestration 
+framework that simplifies the development of applications powered by LLMs. 
+
+Our application starts by connecting to two LLMs that we'll be using: 
+
+* `meta/llama-3.2-1b-instruct`: used for responding to user prompts 
+* `nvidia/llama-3.2-nv-embedqa-1b-v2`: used to calculate embeddings 
+
+``` python
+# connect to a LLM NIM at the specified endpoint, specifying a specific model
+llm = ChatNVIDIA(base_url=INSTRUCT_MODEL_URL, model="meta/llama-3.2-1b-instruct")
+
+# Initialize and connect to a NeMo Retriever Text Embedding NIM (nvidia/llama-3.2-nv-embedqa-1b-v2)
+embeddings_model = NVIDIAEmbeddings(model="nvidia/llama-3.2-nv-embedqa-1b-v2",
+                                   base_url=EMBEDDINGS_MODEL_URL)
+```
+
+The URL's used for both LLMs are defined in the `k8s-manifest.yaml` file: 
+
+``` yaml
+    - name: INSTRUCT_MODEL_URL
+      value: "http://meta-llama-3-2-1b-instruct.nim-service:8000/v1"
+    - name: EMBEDDINGS_MODEL_URL
+      value: "http://llama-32-nv-embedqa-1b-v2.nim-service:8000/v1"
+```
+
+The application then defines a prompt template that will be used in interactions 
+with the LLM: 
+
+``` python
+prompt = ChatPromptTemplate.from_messages([
+    ("system",
+        "You are a helpful and friendly AI!"
+        "Your responses should be concise and no longer than two sentences."
+        "Do not hallucinate. Say you don't know if you don't have this information."
+        "Answer the question using only the context"
+        "\n\nQuestion: {question}\n\nContext: {context}"
+    ),
+    ("user", "{question}")
+])
+```
+
+> Note how we're explicitly instructing the LLM to just say it doesn't know the answer if 
+> it doesn't know, which helps minimize hallucinations. There's also a placeholder for 
+> us to provide context that the LLM can use to answer the question. 
+
+The application uses Flask, and defines a single endpoint named `/askquestion` to 
+respond to questions from end users. To implement this endpoint, the application 
+connects to the Weaviate vector database, and then invokes a chain (using LangChain) 
+that takes the user's question, converts it to an embedding, and then looks up similar 
+documents in the vector database. It then sends the user's question to the LLM, along 
+with the related documents, and returns the LLM's response. 
+
+``` python
+   # connect with the vector store that was populated earlier
+    vector_store = WeaviateVectorStore(
+        client=weaviate_client,
+        embedding=embeddings_model,
+        index_name="CustomDocs",
+        text_key="page_content"
+    )
+
+    chain = (
+        {
+            "context": vector_store.as_retriever(),
+            "question": RunnablePassthrough()
+        }
+        | prompt
+        | llm
+        | StrOutputParser()
+    )
+
+    response = chain.invoke(question)
+```
+
+## Instrument the Application with OpenTelemetry 
+
+To capture metrics, traces, and logs from our application, we've instrumented it with OpenTelemetry. 
+This required adding the following package to the `requirements.txt` file (which ultimately gets 
+installed with `pip install`): 
+
+````
+splunk-opentelemetry==2.7.0
+````
+
+We also added the following to the `Dockerfile` used to build the 
+container image for this application, to install additional OpenTelemetry 
+instrumentation packages: 
+
+``` dockerfile
+# Add additional OpenTelemetry instrumentation packages
+RUN opentelemetry-bootstrap --action=install
+```
+
+Then we modified the `ENTRYPOINT` in the `Dockerfile` to call `opentelemetry-instrument` 
+when running the application: 
+
+``` dockerfile
+ENTRYPOINT ["opentelemetry-instrument", "flask", "run", "-p", "8080", "--host", "0.0.0.0"]
+```
+
+Finally, to enhance the traces and metrics collected with OpenTelemetry, we added a 
+package named [OpenLIT](https://openlit.io/) to the `requirements.txt` file: 
+
+````
+openlit==1.35.4
+````
+
+OpenLIT supports LangChain, and adds additional context to traces at instrumentation time, 
+such as the number of tokens used to process the request, and what the prompt and 
+response were. 
+
+To initialize OpenLIT, we added the following to the application code: 
+
+``` python
+import openlit
+...
+openlit.init(environment="llm-app")
+```
+
 ## Deploy the LLM Application
 
-Let's deploy an application to our OpenShift cluster that answers questions 
-using the context that we loaded into the Weaviate vector database earlier. 
+Use the following command to deploy this application to the OpenShift cluster: 
 
 ``` bash
 oc apply -f ./llm-app/k8s-manifest.yaml
@@ -90,18 +212,3 @@ Finally, we can see the response from the LLM, the time it took, and the number
 input and output tokens utilized: 
 
 ![LLM Response](../images/LLMResponse.png)
-
-## Wrap-Up 
-
-We hope you enjoyed this workshop, which provided hands-on experience deploying and working 
-with several of the technologies that are used to monitor Cisco AI PODs with 
-Splunk Observability Cloud. Specifically, you had the opportunity to: 
-
-* Deploy a RedHat OpenShift cluster with GPU-based worker nodes. 
-* Deploy the NVIDIA NIM Operator and NVIDIA GPU Operator. 
-* Deploy Large Language Models (LLMs) using NVIDIA NIM to the cluster. 
-* Deploy the OpenTelemetry Collector in the Red Hat OpenShift cluster.
-* Add Prometheus receivers to the collector to ingest infrastructure metrics.
-* Deploy the Weaviate vector database to the cluster. 
-* Instrument Python services that interact with Large Language Models (LLMs) with OpenTelemetry. 
-* Understand which details which OpenTelemetry captures in the trace from applications that interact with LLMs. 
diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/_index.md b/content/en/ninja-workshops/14-cisco-ai-pods/_index.md
@@ -16,16 +16,29 @@ scalable, and efficient AI-ready infrastructure tailored to diverse needs.
 **Splunk Observability Cloud** provides comprehensive visibility into all of this infrastructure 
 along with all the application components that are running on this stack.
 
+The steps to configure Splunk Observability Cloud for a Cisco AI POD environment are fully 
+documented (see [here](https://github.com/signalfx/splunk-opentelemetry-examples/tree/main/collector/cisco-ai-ready-pods)
+for details). 
+
+However, it's not always possible to get access to a Cisco AI POD environment to practice 
+the installation steps.
+
 This workshop provides hands-on experience deploying and working with several of the technologies
-that are used to monitor Cisco AI PODs with Splunk Observability Cloud, including:
+that are used to monitor Cisco AI PODs with Splunk Observability Cloud, without requiring 
+access to an actual Cisco AI POD.  This includes: 
 
-* Practice deploying an OpenTelemetry Collector in a Red Hat OpenShift cluster.
-* Practice configuring Prometheus receivers with the collector to ingest infrastructure metrics.
-* Practice instrumenting Python services that interact with Large Language Models (LLMs) with OpenTelemetry. 
+* Practice deploying a **RedHat OpenShift** cluster with GPU-based worker nodes.
+* Practice deploying the **NVIDIA NIM Operator** and **NVIDIA GPU Operator**.
+* Practice deploying **Large Language Models (LLMs)** using NVIDIA NIM to the cluster.
+* Practice deploying the **OpenTelemetry Collector** in the Red Hat OpenShift cluster.
+* Practice adding **Prometheus** receivers to the collector to ingest infrastructure metrics.
+* Practice deploying the **Weaviate** vector database to the cluster.
+* Practice instrumenting Python services that interact with Large Language Models (LLMs) with **OpenTelemetry**.
+* Understanding which details which OpenTelemetry captures in the trace from applications that interact with LLMs.
 
-While access to an actual Cisco AI POD isn't required, the workshop **does** require access 
-to an AWS account.  We'll walk you through the steps of creating a Red Hat OpenShift 
-cluster in AWS that we'll use for the rest of the workshop. 
+> Please note: Red Hat OpenShift and NVIDIA AI Enterprise components 
+> are typically pre-installed with an actual AI POD. However, because we’re using AWS for this workshop, 
+> it’s necessary to perform these setup steps manually.
 
 {{% notice title="Tip" style="primary"  icon="lightbulb" %}}
 The easiest way to navigate through this workshop is by using: