You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/pytorch-llama/pytorch-llama.md
+12-21Lines changed: 12 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ layout: learningpathall
7
7
---
8
8
9
9
## Before you begin
10
-
The instructions in this Learning Path are for any Arm server running Ubuntu 22.04 LTS. You need an Arm server instance with at least 16 cores and 64GB of RAM to run this example. Configure disk storage up to at least 50 GB. The instructions have been tested on an AWS Graviton4 r8g.4xlarge instance.
10
+
The instructions in this Learning Path are for any Arm server running Ubuntu 24.04 LTS. You need an Arm server instance with at least 16 cores and 64GB of RAM to run this example. Configure disk storage up to at least 50 GB. The instructions have been tested on an AWS Graviton4 r8g.4xlarge instance.
11
11
12
12
## Overview
13
13
Arm CPUs are widely used in traditional ML and AI use cases. In this Learning Path, you learn how to run generative AI inference-based use cases like a LLM chatbot using PyTorch on Arm-based CPUs. PyTorch is a popular deep learning framework for AI applications.
@@ -31,16 +31,17 @@ source torch_env/bin/activate
31
31
```
32
32
33
33
### Install PyTorch and optimized libraries
34
-
Torchchat is a library developed by the PyTorch team that facilitates running large language models (LLMs) seamlessly on a variety of devices. TorchAO (Torch Architecture Optimization) is a PyTorch library designed for enhancing the performance of ML models through different quantization and sparsity methods.
34
+
Torchchat is a library developed by the PyTorch team that facilitates running large language models (LLMs) seamlessly on a variety of devices. TorchAO (Torch Architecture Optimization) is a PyTorch library designed for enhancing the performance of ML models through different quantization and sparsity methods.
35
35
36
-
Start by cloning the torchao and torchchat repositories and then applying the Arm specific patches:
36
+
Start by installing pytorch and cloning the torchao and torchchat repositories:
sed -i 's/"groupsize": 0/"groupsize": 32/' config/data/aarch64_cpu_channelwise.json
53
55
pip install -r requirements.txt
54
56
```
55
-
{{% notice Note %}} You will need Python version 3.10 to apply these patches. This is the default version of Python installed on an Ubuntu 22.04 Linux machine. {{% /notice %}}
56
-
57
-
You will now override the installed PyTorch version with a specific version of PyTorch required to take advantage of Arm KleidiAI optimizations:
[Generate an Access Token](https://huggingface.co/settings/tokens) to authenticate your identity with Hugging Face Hub. A token with read-only access is sufficient.
75
67
76
-
Log in to the Hugging Face repository and enter your Access Token key from Hugging face.
68
+
Log in to the Hugging Face repository and enter your Access Token key from Hugging face.
77
69
78
70
```sh
79
71
huggingface-cli login
@@ -86,8 +78,7 @@ In this step, you will download the [Meta Llama3.1 8B Instruct model](https://hu
*** This first iteration will include cold start effects for dynamic import, hardware caches. ***
141
132
```
142
133
143
-
You have successfully run the Llama3.1 8B Instruct Model on your Arm-based server. In the next section, you will walk through the steps to run the same chatbot in your browser.
134
+
You have successfully run the Llama3.1 8B Instruct Model on your Arm-based server. In the next section, you will walk through the steps to run the same chatbot in your browser.
0 commit comments