Skip to content

Commit 57fd5bf

Browse files
authored
Merge pull request #1776 from pareenaverma/content_review
Tech review of llama vision LP complete.
2 parents f0c6657 + 1e93eb4 commit 57fd5bf

File tree

7 files changed

+50
-25
lines changed

7 files changed

+50
-25
lines changed

content/learning-paths/servers-and-cloud-computing/llama-vision/_index.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
---
22
title: Deploy a LLM based Vision Chatbot with PyTorch and Hugging Face Transformers on Google Axion processors
33

4+
draft: true
5+
cascade:
6+
draft: true
7+
48
minutes_to_complete: 45
59

610
who_is_this_for: This Learning Path is for software developers, ML engineers, and those who are interested to deploy production-ready vision chatbot for their application with optimized performance on Arm Architecture.
@@ -13,7 +17,7 @@ learning_objectives:
1317
- Monitor and analyze inference on Arm CPUs.
1418

1519
prerequisites:
16-
- A Google Cloud Axion (or other Arm) compute instance with at least 32 cores.
20+
- A Google Cloud Axion compute instance or [any Arm based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider with atleast 32 cores.
1721
- Basic understanding of Python and ML concepts.
1822
- Familiarity with REST APIs and web services.
1923
- Basic knowledge on Streamlit.

content/learning-paths/servers-and-cloud-computing/llama-vision/backend.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -146,5 +146,15 @@ Use the following command in a terminal to start the backend server:
146146
LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4 TORCHINDUCTOR_CPP_WRAPPER=1 TORCHINDUCTOR_FREEZING=1 OMP_NUM_THREADS=16 python3 backend.py
147147
```
148148

149-
You should see output similar to the image below when the backend server starts successfully:
150-
![backend](backend_output.png)
149+
You should see output similar to:
150+
151+
```output
152+
* Serving Flask app 'backend'
153+
* Debug mode: off
154+
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
155+
* Running oserver has started successfully.
156+
* Running o//127.0.0.1:5000
157+
* Running on http://10.0.0.10:5000
158+
Press CTRL+C to quit
159+
```
160+
The backend server has started successfully.
Binary file not shown.

content/learning-paths/servers-and-cloud-computing/llama-vision/conclusion.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,15 @@ layout: learningpathall
77

88
## Access the Web Application
99

10-
Open the web application in your browser using the external URL:
10+
You can now open the web application in your browser using the external URL:
1111

1212
```bash
1313
http://[your instance ip]:8501
1414
```
1515

1616
{{% notice Note %}}
1717

18-
To access the links you might need to allow inbound TCP traffic in your instance's security rules. Always review these permissions with caution as they might introduce security vulnerabilities.
18+
To access the application, you might need to allow inbound TCP traffic in your instance's security rules. Always review these permissions with caution as they might introduce security vulnerabilities.
1919

2020
For an Axion instance, you can do this from the gcloud cli:
2121

@@ -34,11 +34,11 @@ For this to work, you must ensure that the allow-my-ip tag is present on your Ax
3434

3535
You can upload an image and enter the prompt in the UI to generate response.
3636

37-
You should see LLM generating response based on the prompt considering image as the context as shown in the image below:
37+
You should see the LLM generating a response based on the prompt with the image as the context as shown below:
3838
![browser_output](browser_output.png)
3939

4040
## Further Interaction and Custom Applications
4141

42-
You can continue to query on different images with prompts and observe the response of Vision model on Arm Neoverse based CPUs.
42+
You can continue to experiment with different images and prompts and observe the response of Vision model on Arm Neoverse based CPUs.
4343

44-
This setup demonstrates how you can create various applications and configure your vision based LLMs. This Learning Path serves as a guide and example to showcase the LLM inference of vision models on Arm CPUs, highlighting the optimized inference on CPUs.
44+
This setup demonstrates how you can create various applications and configure your vision based LLMs. This Learning Path serves as a guide and example to showcase the LLM inference of vision models on Arm CPUs, highlighting the optimized inference on CPUs.

content/learning-paths/servers-and-cloud-computing/llama-vision/frontend.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,5 +84,16 @@ Use the following command in a new terminal to start the Streamlit frontend serv
8484
python3 -m streamlit run frontend.py
8585
```
8686

87-
You should see output similar to the image below when the frontend server starts successfully:
88-
![frontend](frontend_output.png)
87+
You should see output similar to what is shown below as the frontend server starts successfully:
88+
89+
```output
90+
Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
91+
92+
93+
You can now view your Streamlit app in your browser.
94+
95+
Local URL: http://localhost:8501
96+
Network URL: http://10.0.0.10:8501
97+
External URL: http://35.223.133.103:8501
98+
```
99+
In the next section you will view your running application within your local browser.
Binary file not shown.

content/learning-paths/servers-and-cloud-computing/llama-vision/vision_chatbot.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,11 @@ layout: "learningpathall"
1010

1111
## Before you begin
1212

13-
This Learning Path demonstrates how to build and deploy a vision chatbot using open-source Large Language Models (LLMs) optimized for Arm architecture. The vision chatbot is capable to take the input as images and text prompt, process both of them and generate the response as text by taking the image input as context. The instructions in this Learning Path have been designed for Arm servers running Ubuntu 24.04 LTS. You need an Arm server instance with at least 32 cores to run this example. The instructions have been tested on a GCP c4a-standard-64 instance.
13+
This Learning Path demonstrates how to build and deploy a vision chatbot using open-source Large Language Models (LLMs) optimized for Arm architecture. The vision chatbot can take both images and text prompts as input, process both and generate the response as text by taking the image input as context. The instructions in this Learning Path have been designed for Arm servers running Ubuntu 24.04 LTS. You will need an Arm server instance with at least 32 cores to run this example. The instructions have been tested on a GCP `c4a-standard-64` instance.
1414

1515
## Overview
1616

17-
In this Learning Path, you will learn how to run a vision chatbot LLM inference using PyTorch and Hugging Face Transformers efficiently on Arm CPUs.
18-
The tutorial includes steps to set up the demo and perform LLM inference by feeding both text and image inputs, which are then processed to generate a text response.
17+
In this Learning Path, you will learn how to run a vision chatbot LLM inference using PyTorch and Hugging Face Transformers efficiently on Arm CPUs. You will learn how to perform LLM inference by feeding both text and image inputs, which are then processed to generate a text response.
1918

2019
## Install dependencies
2120

@@ -26,13 +25,9 @@ sudo apt update
2625
sudo apt install python3-pip python3-venv -y
2726
```
2827

29-
## Create a requirements file
28+
## Create a file with your Python dependencies
3029

31-
```bash
32-
vim requirements.txt
33-
```
34-
35-
Add the following dependencies to your `requirements.txt` file:
30+
Using a file editor of your choice, add the following python dependencies to your `requirements.txt` file:
3631

3732
```python
3833
streamlit
@@ -46,19 +41,21 @@ huggingface_hub
4641

4742
## Install Python Dependencies
4843

44+
You can now create a Python virtual environment and install the dependencies.
45+
4946
Create a virtual environment:
5047
```bash
51-
python3 -m venv llama-vision
48+
python3 -m venv llama-vision
5249
```
5350

5451
Activate the virtual environment:
5552
```bash
56-
source llama-vision/bin/activate
53+
source llama-vision/bin/activate
5754
```
5855

5956
Install the required libraries using pip:
6057
```bash
61-
pip install -r requirements.txt
58+
pip install -r requirements.txt
6259
```
6360

6461
## Install PyTorch
@@ -72,6 +69,7 @@ pip install torch==2.7.0.dev20250307 --extra-index-url https://download.pytorch.
7269
{{% notice Note %}}
7370

7471
If the specified PyTorch version fails to install, you can try installing any PyTorch nightly build from [PyTorch Nightly Builds](https://download.pytorch.org/whl/nightly/cpu/) released after version 2.7.0.dev20250307.
72+
{{% /notice %}}
7573

7674
## Install Torch AO
7775

@@ -94,8 +92,10 @@ Install Torch AO:
9492

9593
## Hugging Face Cli Login
9694

97-
Hugging Face authentication:
95+
To use the [Llama 3.2 11B Vision Model](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) from Hugging Face, you need to request access or accept the terms. You need to log in to Hugging Face using a token.
9896
```bash
9997
huggingface-cli login
100-
Input_token # when prompted to enter
101-
```
98+
```
99+
Enter your Hugging Face token. You can generate a token from [Hugging Face Hub](https://huggingface.co/) by clicking your profile on the top right corner and selecting **Access Tokens**.
100+
101+
You also need to visit the Hugging Face link printed in the login output and accept the terms by clicking the **Agree and access repository** button or filling out the request-for-access form, depending on the model.

0 commit comments

Comments
 (0)