Skip to content

Commit 0bb626a

Browse files
authored
Merge pull request #1552 from JoeStech/rag-axion-lp-edits
For the Axion RAG LP, change graviton references to axion and show how to set up network rules
2 parents 56cead0 + 90ebd54 commit 0bb626a

File tree

3 files changed

+18
-4
lines changed

3 files changed

+18
-4
lines changed

content/learning-paths/servers-and-cloud-computing/rag/_index.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Deploy a RAG-based Chatbot with llama-cpp-python using KleidiAI on Arm Servers
2+
title: Deploy a RAG-based Chatbot with llama-cpp-python using KleidiAI on Google Axion processors
33

44
minutes_to_complete: 45
55

@@ -13,6 +13,7 @@ learning_objectives:
1313
- Monitor and analyze inference performance metrics.
1414

1515
prerequisites:
16+
- A Google Cloud Axion (or other Arm) compute instance with at least 16 cores, 8GB of RAM, and 32GB disk space.
1617
- Basic understanding of Python and ML concepts.
1718
- Familiarity with REST APIs and web services.
1819
- Basic knowledge of vector databases.
@@ -34,6 +35,7 @@ operatingsystems:
3435
tools_software_languages:
3536
- Python
3637
- Streamlit
38+
- Google Axion
3739

3840
### FIXED, DO NOT MODIFY
3941
# ================================================================================

content/learning-paths/servers-and-cloud-computing/rag/chatbot.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,28 @@ layout: learningpathall
77

88
## Access the Web Application
99

10-
Open the web application in your browser using either the local URL or the external URL:
10+
Open the web application in your browser using the external URL:
1111

1212
```bash
13-
http://localhost:8501 or http://75.101.253.177:8501
13+
http://[your instance ip]:8501
1414
```
1515

1616
{{% notice Note %}}
1717

1818
To access the links you may need to allow inbound TCP traffic in your instance's security rules. Always review these permissions with caution as they may introduce security vulnerabilities.
1919

20+
For an Axion instance, this can be done as follows from the gcloud cli:
21+
22+
gcloud compute firewall-rules create allow-my-ip \
23+
--direction=INGRESS \
24+
--network=default \
25+
--action=ALLOW \
26+
--rules=tcp:8501 \
27+
--source-ranges=[your IP]/32 \
28+
--target-tags=allow-my-ip
29+
30+
For this to work, you must ensure that the allow-my-ip tag is present on your Axion instance.
31+
2032
{{% /notice %}}
2133
## Upload a PDF File and Create a New Index
2234

content/learning-paths/servers-and-cloud-computing/rag/rag_llm.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ layout: "learningpathall"
1010

1111
## Before you begin
1212

13-
This learning path demonstrates how to build and deploy a Retrieval Augmented Generation (RAG) enabled chatbot using open-source Large Language Models (LLMs) optimized for Arm architecture. The chatbot processes documents, stores them in a vector database, and generates contextually-relevant responses by combining the LLM's capabilities with retrieved information. The instructions in this Learning Path have been designed for Arm servers running Ubuntu 22.04 LTS. You need an Arm server instance with at least 16 cores and 8GB of RAM to run this example. Configure disk storage up to at least 32GB. The instructions have been tested on an AWS Graviton4 r8g.16xlarge instance.
13+
This learning path demonstrates how to build and deploy a Retrieval Augmented Generation (RAG) enabled chatbot using open-source Large Language Models (LLMs) optimized for Arm architecture. The chatbot processes documents, stores them in a vector database, and generates contextually-relevant responses by combining the LLM's capabilities with retrieved information. The instructions in this Learning Path have been designed for Arm servers running Ubuntu 22.04 LTS. You need an Arm server instance with at least 16 cores, 8GB of RAM, and a 32GB disk to run this example. The instructions have been tested on a GCP c4a-standard-64 instance.
1414

1515
## Overview
1616

0 commit comments

Comments
 (0)