You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/llama-vision/_index.md
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,10 @@
1
1
---
2
2
title: Deploy a LLM based Vision Chatbot with PyTorch and Hugging Face Transformers on Google Axion processors
3
3
4
+
draft: true
5
+
cascade:
6
+
draft: true
7
+
4
8
minutes_to_complete: 45
5
9
6
10
who_is_this_for: This Learning Path is for software developers, ML engineers, and those who are interested to deploy production-ready vision chatbot for their application with optimized performance on Arm Architecture.
@@ -13,7 +17,7 @@ learning_objectives:
13
17
- Monitor and analyze inference on Arm CPUs.
14
18
15
19
prerequisites:
16
-
- A Google Cloud Axion (or other Arm) compute instance with at least 32 cores.
20
+
- A Google Cloud Axion compute instance or [any Arm based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider with atleast 32 cores.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/llama-vision/conclusion.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,15 +7,15 @@ layout: learningpathall
7
7
8
8
## Access the Web Application
9
9
10
-
Open the web application in your browser using the external URL:
10
+
You can now open the web application in your browser using the external URL:
11
11
12
12
```bash
13
13
http://[your instance ip]:8501
14
14
```
15
15
16
16
{{% notice Note %}}
17
17
18
-
To access the links you might need to allow inbound TCP traffic in your instance's security rules. Always review these permissions with caution as they might introduce security vulnerabilities.
18
+
To access the application, you might need to allow inbound TCP traffic in your instance's security rules. Always review these permissions with caution as they might introduce security vulnerabilities.
19
19
20
20
For an Axion instance, you can do this from the gcloud cli:
21
21
@@ -34,11 +34,11 @@ For this to work, you must ensure that the allow-my-ip tag is present on your Ax
34
34
35
35
You can upload an image and enter the prompt in the UI to generate response.
36
36
37
-
You should see LLM generating response based on the prompt considering image as the context as shown in the image below:
37
+
You should see the LLM generating a response based on the prompt with the image as the context as shown below:
38
38

39
39
40
40
## Further Interaction and Custom Applications
41
41
42
-
You can continue to query on different images with prompts and observe the response of Vision model on Arm Neoverse based CPUs.
42
+
You can continue to experiment with different images and prompts and observe the response of Vision model on Arm Neoverse based CPUs.
43
43
44
-
This setup demonstrates how you can create various applications and configure your vision based LLMs. This Learning Path serves as a guide and example to showcase the LLM inference of vision models on Arm CPUs, highlighting the optimized inference on CPUs.
44
+
This setup demonstrates how you can create various applications and configure your vision based LLMs. This Learning Path serves as a guide and example to showcase the LLM inference of vision models on Arm CPUs, highlighting the optimized inference on CPUs.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/llama-vision/vision_chatbot.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,12 +10,11 @@ layout: "learningpathall"
10
10
11
11
## Before you begin
12
12
13
-
This Learning Path demonstrates how to build and deploy a vision chatbot using open-source Large Language Models (LLMs) optimized for Arm architecture. The vision chatbot is capable to take the input as images and text prompt, process both of them and generate the response as text by taking the image input as context. The instructions in this Learning Path have been designed for Arm servers running Ubuntu 24.04 LTS. You need an Arm server instance with at least 32 cores to run this example. The instructions have been tested on a GCP c4a-standard-64 instance.
13
+
This Learning Path demonstrates how to build and deploy a vision chatbot using open-source Large Language Models (LLMs) optimized for Arm architecture. The vision chatbot can take both images and text prompts as input, process both and generate the response as text by taking the image input as context. The instructions in this Learning Path have been designed for Arm servers running Ubuntu 24.04 LTS. You will need an Arm server instance with at least 32 cores to run this example. The instructions have been tested on a GCP `c4a-standard-64` instance.
14
14
15
15
## Overview
16
16
17
-
In this Learning Path, you will learn how to run a vision chatbot LLM inference using PyTorch and Hugging Face Transformers efficiently on Arm CPUs.
18
-
The tutorial includes steps to set up the demo and perform LLM inference by feeding both text and image inputs, which are then processed to generate a text response.
17
+
In this Learning Path, you will learn how to run a vision chatbot LLM inference using PyTorch and Hugging Face Transformers efficiently on Arm CPUs. You will learn how to perform LLM inference by feeding both text and image inputs, which are then processed to generate a text response.
19
18
20
19
## Install dependencies
21
20
@@ -26,13 +25,9 @@ sudo apt update
26
25
sudo apt install python3-pip python3-venv -y
27
26
```
28
27
29
-
## Create a requirements file
28
+
## Create a file with your Python dependencies
30
29
31
-
```bash
32
-
vim requirements.txt
33
-
```
34
-
35
-
Add the following dependencies to your `requirements.txt` file:
30
+
Using a file editor of your choice, add the following python dependencies to your `requirements.txt` file:
36
31
37
32
```python
38
33
streamlit
@@ -46,19 +41,21 @@ huggingface_hub
46
41
47
42
## Install Python Dependencies
48
43
44
+
You can now create a Python virtual environment and install the dependencies.
If the specified PyTorch version fails to install, you can try installing any PyTorch nightly build from [PyTorch Nightly Builds](https://download.pytorch.org/whl/nightly/cpu/) released after version 2.7.0.dev20250307.
72
+
{{% /notice %}}
75
73
76
74
## Install Torch AO
77
75
@@ -94,8 +92,10 @@ Install Torch AO:
94
92
95
93
## Hugging Face Cli Login
96
94
97
-
Hugging Face authentication:
95
+
To use the [Llama 3.2 11B Vision Model](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) from Hugging Face, you need to request access or accept the terms. You need to log in to Hugging Face using a token.
98
96
```bash
99
97
huggingface-cli login
100
-
Input_token # when prompted to enter
101
-
```
98
+
```
99
+
Enter your Hugging Face token. You can generate a token from [Hugging Face Hub](https://huggingface.co/) by clicking your profile on the top right corner and selecting **Access Tokens**.
100
+
101
+
You also need to visit the Hugging Face link printed in the login output and accept the terms by clicking the **Agree and access repository** button or filling out the request-for-access form, depending on the model.
0 commit comments