Merge pull request #1776 from pareenaverma/content_review

pareenaverma · web-flow · commit 57fd5bfb92e2 · 2025-03-31T15:45:34.000-04:00
Tech review of llama vision LP complete.
diff --git a/content/learning-paths/servers-and-cloud-computing/llama-vision/_index.md b/content/learning-paths/servers-and-cloud-computing/llama-vision/_index.md
@@ -1,6 +1,10 @@
 ---
 title: Deploy a LLM based Vision Chatbot with PyTorch and Hugging Face Transformers on Google Axion processors
 
+draft: true
+cascade:
+    draft: true
+
 minutes_to_complete: 45
 
 who_is_this_for: This Learning Path is for software developers, ML engineers, and those who are interested to deploy production-ready vision chatbot for their application with optimized performance on Arm Architecture.
@@ -13,7 +17,7 @@ learning_objectives:
     - Monitor and analyze inference on Arm CPUs.
 
 prerequisites:
-    - A Google Cloud Axion (or other Arm) compute instance with at least 32 cores.
+    - A Google Cloud Axion compute instance or [any Arm based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider with atleast 32 cores.
     - Basic understanding of Python and ML concepts.
     - Familiarity with REST APIs and web services.
     - Basic knowledge on Streamlit.
diff --git a/content/learning-paths/servers-and-cloud-computing/llama-vision/backend.md b/content/learning-paths/servers-and-cloud-computing/llama-vision/backend.md
@@ -146,5 +146,15 @@ Use the following command in a terminal to start the backend server:
 LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4 TORCHINDUCTOR_CPP_WRAPPER=1 TORCHINDUCTOR_FREEZING=1 OMP_NUM_THREADS=16 python3 backend.py
 ```
 
-You should see output similar to the image below when the backend server starts successfully:
-![backend](backend_output.png)
+You should see output similar to:
+
+```output
+* Serving Flask app 'backend'
+ * Debug mode: off
+WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
+ * Running oserver has started successfully.
+ * Running o//127.0.0.1:5000
+ * Running on http://10.0.0.10:5000
+Press CTRL+C to quit
+```
+The backend server has started successfully. 
diff --git a/content/learning-paths/servers-and-cloud-computing/llama-vision/backend_output.png b/content/learning-paths/servers-and-cloud-computing/llama-vision/backend_output.png
diff --git a/content/learning-paths/servers-and-cloud-computing/llama-vision/conclusion.md b/content/learning-paths/servers-and-cloud-computing/llama-vision/conclusion.md
@@ -7,15 +7,15 @@ layout: learningpathall
 
 ## Access the Web Application
 
-Open the web application in your browser using the external URL:
+You can now open the web application in your browser using the external URL:
 
 ```bash
 http://[your instance ip]:8501
 ```
 
 {{% notice Note %}}
 
-To access the links you might need to allow inbound TCP traffic in your instance's security rules. Always review these permissions with caution as they might introduce security vulnerabilities.
+To access the application,  you might need to allow inbound TCP traffic in your instance's security rules. Always review these permissions with caution as they might introduce security vulnerabilities.
 
 For an Axion instance, you can do this from the gcloud cli:
 
@@ -34,11 +34,11 @@ For this to work, you must ensure that the allow-my-ip tag is present on your Ax
 
 You can upload an image and enter the prompt in the UI to generate response.
 
-You should see LLM generating response based on the prompt considering image as the context as shown in the image below:
+You should see the LLM generating a response based on the prompt with the image as the context as shown below:
 ![browser_output](browser_output.png)
 
 ## Further Interaction and Custom Applications
 
-You can continue to query on different images with prompts and observe the response of Vision model on Arm Neoverse based CPUs.
+You can continue to experiment with different images and prompts and observe the response of Vision model on Arm Neoverse based CPUs.
 
-This setup demonstrates how you can create various applications and configure your vision based LLMs. This Learning Path serves as a guide and example to showcase the LLM inference of vision models on Arm CPUs, highlighting the optimized inference on CPUs.
+This setup demonstrates how you can create various applications and configure your vision based LLMs. This Learning Path serves as a guide and example to showcase the LLM inference of vision models on Arm CPUs, highlighting the optimized inference on CPUs.
diff --git a/content/learning-paths/servers-and-cloud-computing/llama-vision/frontend.md b/content/learning-paths/servers-and-cloud-computing/llama-vision/frontend.md
@@ -84,5 +84,16 @@ Use the following command in a new terminal to start the Streamlit frontend serv
 python3 -m streamlit run frontend.py
 ```
 
-You should see output similar to the image below when the frontend server starts successfully:
-![frontend](frontend_output.png)
+You should see output similar to what is shown below as the frontend server starts successfully:
+
+```output
+Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
+
+
+  You can now view your Streamlit app in your browser.
+
+  Local URL: http://localhost:8501
+  Network URL: http://10.0.0.10:8501
+  External URL: http://35.223.133.103:8501
+```
+In the next section you will view your running application within your local browser.
diff --git a/content/learning-paths/servers-and-cloud-computing/llama-vision/frontend_output.png b/content/learning-paths/servers-and-cloud-computing/llama-vision/frontend_output.png
diff --git a/content/learning-paths/servers-and-cloud-computing/llama-vision/vision_chatbot.md b/content/learning-paths/servers-and-cloud-computing/llama-vision/vision_chatbot.md
@@ -10,12 +10,11 @@ layout: "learningpathall"
 
 ## Before you begin
 
-This Learning Path demonstrates how to build and deploy a vision chatbot using open-source Large Language Models (LLMs) optimized for Arm architecture. The vision chatbot is capable to take the input as images and text prompt, process both of them and generate the response as text by taking the image input as context. The instructions in this Learning Path have been designed for Arm servers running Ubuntu 24.04 LTS. You need an Arm server instance with at least 32 cores to run this example. The instructions have been tested on a GCP c4a-standard-64 instance.
+This Learning Path demonstrates how to build and deploy a vision chatbot using open-source Large Language Models (LLMs) optimized for Arm architecture. The vision chatbot can take both images and text prompts as input, process both and generate the response as text by taking the image input as context. The instructions in this Learning Path have been designed for Arm servers running Ubuntu 24.04 LTS. You will need an Arm server instance with at least 32 cores to run this example. The instructions have been tested on a GCP `c4a-standard-64` instance.
 
 ## Overview
 
-In this Learning Path, you will learn how to run a vision chatbot LLM inference using PyTorch and Hugging Face Transformers efficiently on Arm CPUs.
-The tutorial includes steps to set up the demo and perform LLM inference by feeding both text and image inputs, which are then processed to generate a text response.
+In this Learning Path, you will learn how to run a vision chatbot LLM inference using PyTorch and Hugging Face Transformers efficiently on Arm CPUs. You will learn how to perform LLM inference by feeding both text and image inputs, which are then processed to generate a text response.
 
 ## Install dependencies
 
@@ -26,13 +25,9 @@ sudo apt update
 sudo apt install python3-pip python3-venv -y
 ```
 
-## Create a requirements file
+## Create a file with your Python dependencies
 
-```bash
-vim requirements.txt
-```
-
-Add the following dependencies to your `requirements.txt` file:
+Using a file editor of your choice, add the following python dependencies to your `requirements.txt` file:
 
 ```python
 streamlit
@@ -46,19 +41,21 @@ huggingface_hub
 
 ## Install Python Dependencies
 
+You can now create a Python virtual environment and install the dependencies.
+
 Create a virtual environment:
 ```bash
-    python3 -m venv llama-vision
+python3 -m venv llama-vision
 ```
 
 Activate the virtual environment:
 ```bash
-    source llama-vision/bin/activate
+source llama-vision/bin/activate
 ```
 
 Install the required libraries using pip:
 ```bash
-    pip install -r requirements.txt
+ pip install -r requirements.txt
 ```
 
 ## Install PyTorch
@@ -72,6 +69,7 @@ pip install torch==2.7.0.dev20250307 --extra-index-url https://download.pytorch.
 {{% notice Note %}}
 
 If the specified PyTorch version fails to install, you can try installing any PyTorch nightly build from [PyTorch Nightly Builds](https://download.pytorch.org/whl/nightly/cpu/) released after version 2.7.0.dev20250307.
+{{% /notice %}}
 
 ## Install Torch AO
 
@@ -94,8 +92,10 @@ Install Torch AO:
 
 ## Hugging Face Cli Login
 
-Hugging Face authentication:
+To use the [Llama 3.2 11B Vision Model](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) from Hugging Face, you need to request access or accept the terms. You need to log in to Hugging Face using a token.
 ```bash
     huggingface-cli login
-    Input_token             # when prompted to enter
-```
+```
+Enter your Hugging Face token. You can generate a token from [Hugging Face Hub](https://huggingface.co/) by clicking your profile on the top right corner and selecting **Access Tokens**. 
+
+You also need to visit the Hugging Face link printed in the login output and accept the terms by clicking the **Agree and access repository** button or filling out the request-for-access form, depending on the model.