Skip to content

Commit e9ba10a

Browse files
committed
final draft of the AI PODs workshop
1 parent 2d62b17 commit e9ba10a

File tree

12 files changed

+102
-49
lines changed

12 files changed

+102
-49
lines changed

content/en/ninja-workshops/14-cisco-ai-pods/9-deploy-llm-app.md

Lines changed: 38 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,7 @@ Let's deploy an application to our OpenShift cluster that answers questions
1515
using the context that we loaded into the Weaviate vector database earlier.
1616

1717
``` bash
18-
cd workshop/cisco-ai-pods/llm-app
19-
oc apply -f k8s-manifest.yaml
18+
oc apply -f ./llm-app/k8s-manifest.yaml
2019
```
2120

2221
> Note: to build a Docker image for this Python application, we executed the following commands:
@@ -55,7 +54,7 @@ curl -X "POST" \
5554
{{% tab title="Example Output" %}}
5655
5756
``` bash
58-
The NVIDIA H200 graphics card has 5536 MB of GDDR6 memory.
57+
The NVIDIA H200 has 141GB of HBM3e memory, which is twice the capacity of the NVIDIA H100 Tensor Core GPU with 1.4X more memory bandwidth.
5958
```
6059
6160
{{% /tab %}}
@@ -67,7 +66,42 @@ In Splunk Observability Cloud, navigate to `APM` and then select `Service Map`.
6766
Ensure the `llm-app` environment is selected. You should see a service map
6867
that looks like the following:
6968
69+
![Service Map](../images/ServiceMap.png)
70+
7071
Click on `Traces` on the right-hand side menu. Then select one of the slower running
71-
traces.
72+
traces. It should look like the following example:
73+
74+
![Trace](../images/Trace.png)
75+
76+
The trace shows all the interactions that our application executed to return an answer
77+
to the users question (i.e. "How much memory does the NVIDIA H200 have?")
78+
79+
For example, we can see where our application performed a similarity search to look
80+
for documents related to the question at hand in the Weaviate vector database:
81+
82+
![Document Retrieval](../images/DocumentRetrieval.png)
83+
84+
We can also see how the application created a prompt to send to the LLM, including the
85+
context that was retrieved from the vector database:
86+
87+
![Prompt Template](../images/PromptTemplate.png)
88+
89+
Finally, we can see the response from the LLM, the time it took, and the number of
90+
input and output tokens utilized:
91+
92+
![LLM Response](../images/LLMResponse.png)
93+
94+
## Wrap-Up
7295
96+
We hope you enjoyed this workshop, which provided hands-on experience deploying and working
97+
with several of the technologies that are used to monitor Cisco AI PODs with
98+
Splunk Observability Cloud. Specifically, you had the opportunity to:
7399
100+
* Deploy a RedHat OpenShift cluster with GPU-based worker nodes.
101+
* Deploy the NVIDIA NIM Operator and NVIDIA GPU Operator.
102+
* Deploy Large Language Models (LLMs) using NVIDIA NIM to the cluster.
103+
* Deploy the OpenTelemetry Collector in the Red Hat OpenShift cluster.
104+
* Add Prometheus receivers to the collector to ingest infrastructure metrics.
105+
* Deploy the Weaviate vector database to the cluster.
106+
* Instrument Python services that interact with Large Language Models (LLMs) with OpenTelemetry.
107+
* Understand which details which OpenTelemetry captures in the trace from applications that interact with LLMs.
555 KB
Loading
487 KB
Loading
612 KB
Loading
301 KB
Loading
453 KB
Loading
49 Bytes
Loading

workshop/cisco-ai-pods/llm-app/Dockerfile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@ RUN pip install -r requirements.txt
1313
# Add additional OpenTelemetry instrumentation packages
1414
RUN opentelemetry-bootstrap --action=install
1515

16+
# Remove unwanted instrumentation to ensure we get a clean trace
17+
RUN pip uninstall -y opentelemetry-instrumentation-httpx
18+
RUN pip uninstall -y opentelemetry-instrumentation-requests
19+
1620
# Copy the application code
1721
COPY . /app
1822

workshop/cisco-ai-pods/llm-app/app.py

Lines changed: 54 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,12 @@
1111
from langchain_core.output_parsers import StrOutputParser
1212
from langchain_weaviate import WeaviateVectorStore
1313

14+
logging.basicConfig(level=logging.INFO)
15+
logger = logging.getLogger(__name__)
16+
1417
app = Flask(__name__)
1518

16-
openlit.init()
19+
openlit.init(environment="llm-app")
1720

1821
# Read environment variables
1922
INSTRUCT_MODEL_URL = os.getenv('INSTRUCT_MODEL_URL') # i.e. http://localhost:8000/v1
@@ -40,47 +43,53 @@
4043
@app.route("/askquestion", methods=['POST'])
4144
def ask_question():
4245

43-
data = request.json
44-
question = data.get('question')
45-
46-
weaviate_client = weaviate.connect_to_custom(
47-
# url is: http://weaviate.weaviate.svc.cluster.local:80
48-
http_host=os.getenv('WEAVIATE_HTTP_HOST'),
49-
http_port=os.getenv('WEAVIATE_HTTP_PORT'),
50-
http_secure=False,
51-
grpc_host=os.getenv('WEAVIATE_GRPC_HOST'),
52-
grpc_port=os.getenv('WEAVIATE_GRPC_PORT'),
53-
grpc_secure=False
54-
)
55-
56-
# connect with the vector store that was populated earlier
57-
vector_store = WeaviateVectorStore(
58-
client=weaviate_client,
59-
embedding=embeddings_model,
60-
index_name="CustomDocs",
61-
text_key="page_content"
62-
)
63-
64-
chain = (
65-
{
66-
"context": vector_store.as_retriever(),
67-
"question": RunnablePassthrough()
68-
}
69-
| prompt
70-
| llm
71-
| StrOutputParser()
72-
)
73-
74-
# Get the schema which contains all collections
75-
schema = weaviate_client.collections.list_all()
76-
77-
logger.info("Available collections in Weaviate:")
78-
for collection_name, collection_config in schema.items():
79-
print(f"- {collection_name}")
80-
81-
response = chain.invoke(question)
82-
logger.info(response)
83-
84-
weaviate_client.close()
85-
86-
return response
46+
logger.info(f"Responding to question")
47+
try:
48+
data = request.json
49+
question = data.get('question')
50+
51+
weaviate_client = weaviate.connect_to_custom(
52+
# url is: http://weaviate.weaviate.svc.cluster.local:80
53+
http_host=os.getenv('WEAVIATE_HTTP_HOST'),
54+
http_port=os.getenv('WEAVIATE_HTTP_PORT'),
55+
http_secure=False,
56+
grpc_host=os.getenv('WEAVIATE_GRPC_HOST'),
57+
grpc_port=os.getenv('WEAVIATE_GRPC_PORT'),
58+
grpc_secure=False
59+
)
60+
61+
# connect with the vector store that was populated earlier
62+
vector_store = WeaviateVectorStore(
63+
client=weaviate_client,
64+
embedding=embeddings_model,
65+
index_name="CustomDocs",
66+
text_key="page_content"
67+
)
68+
69+
chain = (
70+
{
71+
"context": vector_store.as_retriever(),
72+
"question": RunnablePassthrough()
73+
}
74+
| prompt
75+
| llm
76+
| StrOutputParser()
77+
)
78+
79+
# Get the schema which contains all collections
80+
schema = weaviate_client.collections.list_all()
81+
82+
logger.info("Available collections in Weaviate:")
83+
for collection_name, collection_config in schema.items():
84+
print(f"- {collection_name}")
85+
86+
response = chain.invoke(question)
87+
logger.info(response)
88+
89+
weaviate_client.close()
90+
91+
return response
92+
93+
except Exception as e:
94+
logger.error(f"Error responding to question: {e}")
95+
return None

workshop/cisco-ai-pods/llm-app/k8s-manifest.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ spec:
3939
# filter out health check requests to the root URL
4040
- name: OTEL_PYTHON_EXCLUDED_URLS
4141
value: "^(https?://)?[^/]+(/)?$"
42+
- name: OTEL_PYTHON_DISABLED_INSTRUMENTATIONS
43+
value: "httpx,requests"
4244
- name: SPLUNK_PROFILER_ENABLED
4345
value: "true"
4446
- name: INSTRUCT_MODEL_URL

0 commit comments

Comments
 (0)