@@ -5,14 +5,136 @@ weight: 9
55time : 10 minutes
66---
77
8- In the final step of the workshop, we'll deploy an application to our Cisco AI POD
8+ In the final step of the workshop, we'll deploy an application to our OpenShift cluster
99that uses the instruct and embeddings models that we deployed earlier using the
1010NVIDIA NIM operator.
1111
12+ ## Application Overview
13+
14+ Like most applications that interact with LLMs, our application is written in Python.
15+ It also uses [ LangChain] ( https://www.langchain.com/ ) , which is an open-source orchestration
16+ framework that simplifies the development of applications powered by LLMs.
17+
18+ Our application starts by connecting to two LLMs that we'll be using:
19+
20+ * ` meta/llama-3.2-1b-instruct ` : used for responding to user prompts
21+ * ` nvidia/llama-3.2-nv-embedqa-1b-v2 ` : used to calculate embeddings
22+
23+ ``` python
24+ # connect to a LLM NIM at the specified endpoint, specifying a specific model
25+ llm = ChatNVIDIA(base_url = INSTRUCT_MODEL_URL , model = " meta/llama-3.2-1b-instruct" )
26+
27+ # Initialize and connect to a NeMo Retriever Text Embedding NIM (nvidia/llama-3.2-nv-embedqa-1b-v2)
28+ embeddings_model = NVIDIAEmbeddings(model = " nvidia/llama-3.2-nv-embedqa-1b-v2" ,
29+ base_url = EMBEDDINGS_MODEL_URL )
30+ ```
31+
32+ The URL's used for both LLMs are defined in the ` k8s-manifest.yaml ` file:
33+
34+ ``` yaml
35+ - name : INSTRUCT_MODEL_URL
36+ value : " http://meta-llama-3-2-1b-instruct.nim-service:8000/v1"
37+ - name : EMBEDDINGS_MODEL_URL
38+ value : " http://llama-32-nv-embedqa-1b-v2.nim-service:8000/v1"
39+ ` ` `
40+
41+ The application then defines a prompt template that will be used in interactions
42+ with the LLM:
43+
44+ ` ` ` python
45+ prompt = ChatPromptTemplate.from_messages([
46+ ("system",
47+ " You are a helpful and friendly AI!"
48+ " Your responses should be concise and no longer than two sentences."
49+ " Do not hallucinate. Say you don't know if you don't have this information."
50+ " Answer the question using only the context"
51+ " \n\n Question: {question}\n\n Context: {context}"
52+ ),
53+ ("user", "{question}")
54+ ])
55+ ```
56+
57+ > Note how we're explicitly instructing the LLM to just say it doesn't know the answer if
58+ > it doesn't know, which helps minimize hallucinations. There's also a placeholder for
59+ > us to provide context that the LLM can use to answer the question.
60+
61+ The application uses Flask, and defines a single endpoint named ` /askquestion ` to
62+ respond to questions from end users. To implement this endpoint, the application
63+ connects to the Weaviate vector database, and then invokes a chain (using LangChain)
64+ that takes the user's question, converts it to an embedding, and then looks up similar
65+ documents in the vector database. It then sends the user's question to the LLM, along
66+ with the related documents, and returns the LLM's response.
67+
68+ ``` python
69+ # connect with the vector store that was populated earlier
70+ vector_store = WeaviateVectorStore(
71+ client = weaviate_client,
72+ embedding = embeddings_model,
73+ index_name = " CustomDocs" ,
74+ text_key = " page_content"
75+ )
76+
77+ chain = (
78+ {
79+ " context" : vector_store.as_retriever(),
80+ " question" : RunnablePassthrough()
81+ }
82+ | prompt
83+ | llm
84+ | StrOutputParser()
85+ )
86+
87+ response = chain.invoke(question)
88+ ```
89+
90+ ## Instrument the Application with OpenTelemetry
91+
92+ To capture metrics, traces, and logs from our application, we've instrumented it with OpenTelemetry.
93+ This required adding the following package to the ` requirements.txt ` file (which ultimately gets
94+ installed with ` pip install ` ):
95+
96+ ````
97+ splunk-opentelemetry==2.7.0
98+ ````
99+
100+ We also added the following to the ` Dockerfile ` used to build the
101+ container image for this application, to install additional OpenTelemetry
102+ instrumentation packages:
103+
104+ ``` dockerfile
105+ # Add additional OpenTelemetry instrumentation packages
106+ RUN opentelemetry-bootstrap --action=install
107+ ```
108+
109+ Then we modified the ` ENTRYPOINT ` in the ` Dockerfile ` to call ` opentelemetry-instrument `
110+ when running the application:
111+
112+ ``` dockerfile
113+ ENTRYPOINT ["opentelemetry-instrument" , "flask" , "run" , "-p" , "8080" , "--host" , "0.0.0.0" ]
114+ ```
115+
116+ Finally, to enhance the traces and metrics collected with OpenTelemetry, we added a
117+ package named [ OpenLIT] ( https://openlit.io/ ) to the ` requirements.txt ` file:
118+
119+ ````
120+ openlit==1.35.4
121+ ````
122+
123+ OpenLIT supports LangChain, and adds additional context to traces at instrumentation time,
124+ such as the number of tokens used to process the request, and what the prompt and
125+ response were.
126+
127+ To initialize OpenLIT, we added the following to the application code:
128+
129+ ``` python
130+ import openlit
131+ ...
132+ openlit.init(environment = " llm-app" )
133+ ```
134+
12135## Deploy the LLM Application
13136
14- Let's deploy an application to our OpenShift cluster that answers questions
15- using the context that we loaded into the Weaviate vector database earlier.
137+ Use the following command to deploy this application to the OpenShift cluster:
16138
17139``` bash
18140oc apply -f ./llm-app/k8s-manifest.yaml
@@ -90,18 +212,3 @@ Finally, we can see the response from the LLM, the time it took, and the number
90212input and output tokens utilized:
91213
92214
93-
94- ## Wrap-Up
95-
96- We hope you enjoyed this workshop, which provided hands-on experience deploying and working
97- with several of the technologies that are used to monitor Cisco AI PODs with
98- Splunk Observability Cloud. Specifically, you had the opportunity to:
99-
100- * Deploy a RedHat OpenShift cluster with GPU-based worker nodes.
101- * Deploy the NVIDIA NIM Operator and NVIDIA GPU Operator.
102- * Deploy Large Language Models (LLMs) using NVIDIA NIM to the cluster.
103- * Deploy the OpenTelemetry Collector in the Red Hat OpenShift cluster.
104- * Add Prometheus receivers to the collector to ingest infrastructure metrics.
105- * Deploy the Weaviate vector database to the cluster.
106- * Instrument Python services that interact with Large Language Models (LLMs) with OpenTelemetry.
107- * Understand which details which OpenTelemetry captures in the trace from applications that interact with LLMs.
0 commit comments