Skip to content

Commit 294b617

Browse files
Merge pull request ArmDeveloperEcosystem#2080 from juliensimon/arcee-foundation-model-on-aws
New learning path: Deploy Arcee AFM-4.5B on AWS Graviton4
2 parents 1e59791 + ec8cc77 commit 294b617

File tree

10 files changed

+879
-0
lines changed

10 files changed

+879
-0
lines changed
Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
---
2+
title: Launching a Graviton4 instance
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## System Requirements
10+
11+
- An AWS account
12+
13+
- Quota for c8g instances in your preferred region
14+
15+
- A Linux or MacOS host
16+
17+
- A c8g instance (4xlarge or larger)
18+
19+
- At least 128GB of storage
20+
21+
## AWS Console Steps
22+
23+
Follow these steps to launch your EC2 instance using the AWS Management Console:
24+
25+
### Step 1: Create an SSH Key Pair
26+
27+
1. **Navigate to EC2 Console**
28+
29+
- Go to the [AWS Management Console](https://console.aws.amazon.com)
30+
31+
- Search for "EC2" and click on "EC2" service
32+
33+
2. **Create Key Pair**
34+
35+
- In the left navigation pane, click "Key Pairs" under "Network & Security"
36+
37+
- Click "Create key pair"
38+
39+
- Enter name: `arcee-graviton4-key`
40+
41+
- Select "RSA" as the key pair type
42+
43+
- Select ".pem" as the private key file format
44+
45+
- Click "Create key pair"
46+
47+
- The private key file will automatically download to your computer
48+
49+
3. **Secure the Key File**
50+
51+
- Move the downloaded `.pem` file to the SSH configuration directory
52+
```bash
53+
mkdir -p ~/.ssh
54+
mv arcee-graviton4-key.pem ~/.ssh
55+
```
56+
57+
- Set proper permissions (on Mac/Linux):
58+
```bash
59+
chmod 400 ~/.ssh/arcee-graviton4-key.pem
60+
```
61+
62+
### Step 2: Launch EC2 Instance
63+
64+
1. **Start Instance Launch**
65+
66+
- In the left navigation pane, click "Instances" under "Instances"
67+
68+
- Click "Launch instances" button
69+
70+
2. **Configure Instance Details**
71+
72+
- **Name and tags**: Enter `Arcee-Graviton4-Instance` as the instance name
73+
74+
- **Application and OS Images**:
75+
- Click "Quick Start" tab
76+
77+
- Select "Ubuntu"
78+
79+
- Choose "Ubuntu Server 24.04 LTS (HVM), SSD Volume Type"
80+
81+
- **Important**: Ensure the architecture shows "64-bit (ARM)" for Graviton compatibility
82+
83+
- **Instance type**:
84+
- Click on "Select instance type"
85+
86+
- Select `c8g.4xlarge` or larger
87+
88+
3. **Configure Key Pair**
89+
90+
In "Key pair name", select the SSH keypair you created earlier (`Arcee-Graviton4-Instance`)
91+
92+
4. **Configure Network Settings**
93+
94+
- **Network**: Select a VPC with a least one public subnet.
95+
96+
- **Subnet**: Select a public subnet in the VPC
97+
98+
- **Auto-assign Public IP**: Enable
99+
100+
- **Firewall (security groups)**
101+
102+
- Click on "Create security group"
103+
104+
- Click on "Allow SSH traffic from"
105+
106+
- In the dropdown list, select "My IP".
107+
108+
Note 1: you will only be able to connect to the instance from your current host, which is the safest setting. We don't recommend selecting "Anywhere", which would allow anyone on the Internet to attempt to connect. Use at your own risk.
109+
110+
Note 2: although this demonstration only requires SSH access, feel free to use one of your existing security groups as long as it allows SSH traffic.
111+
112+
5. **Configure Storage**
113+
114+
- **Root volume**:
115+
- Size: `128` GB
116+
117+
- Volume type: `gp3`
118+
119+
7. **Review and Launch**
120+
121+
- Review all settings in the "Summary" section
122+
123+
- Click "Launch instance"
124+
125+
### Step 3: Monitor Instance Launch
126+
127+
1. **View Launch Status**
128+
129+
After a few seconds, you should see a message similar to this one:
130+
131+
`Successfully initiated launch of instance (i-<unique instance ID>)`
132+
133+
If instance launch fails, please review your settings and try again.
134+
135+
2. **Get Connection Information**
136+
137+
- Click on the instance id, or look for the instance in the Instances list in the EC2 console.
138+
139+
- In the "Details" tab of the instance, note the "Public DNS" host name
140+
141+
- This is the host name you'll use to connect via SSH, aka `PUBLIC_DNS_HOSTNAME`
142+
143+
### Step 4: Connect to Your Instance
144+
145+
1. **Open Terminal/Command Prompt**
146+
147+
2. **Connect via SSH**
148+
```bash
149+
ssh -i ~/.ssh/arcee-graviton4-key.pem ubuntu@<PUBLIC_DNS_HOSTNAME>
150+
```
151+
152+
3. **Accept Security Warning**
153+
154+
- When prompted about authenticity of host, type `yes`
155+
156+
- You should now be connected to your Ubuntu instance
157+
158+
### Important Notes
159+
160+
- **Region Selection**: Ensure you're in your preferred AWS region before launching
161+
162+
- **AMI Selection**: The Ubuntu 24.04 LTS AMI must be ARM64 compatible for Graviton processors
163+
164+
- **Security**: please think twice about allowing SSH from anywhere (0.0.0.0/0). We strongly recommend restricting access to your IP address
165+
166+
- **Storage**: The 128GB EBS volume is sufficient for the Arcee model and dependencies
167+
168+
- **Backup**: Consider creating AMIs or snapshots for backup purposes
169+
170+
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
title: Setting up the instance
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
In this step, we'll set up the Graviton4 instance with all the necessary tools and dependencies required to build and run the Arcee Foundation Model. This includes installing the build tools and Python environment.
10+
11+
## Step 1: Update Package List
12+
13+
```bash
14+
sudo apt-get update
15+
```
16+
17+
This command updates the local package index from the repositories:
18+
19+
- Downloads the latest package lists from all configured APT repositories
20+
- Ensures you have the most recent information about available packages and their versions
21+
- This is a best practice before installing new packages to avoid potential conflicts
22+
- The package index contains metadata about available packages, their dependencies, and version information
23+
24+
## Step 2: Install System Dependencies
25+
26+
```bash
27+
sudo apt-get install cmake gcc g++ git python3 python3-pip python3-virtualenv libcurl4-openssl-dev unzip -y
28+
```
29+
30+
This command installs all the essential development tools and dependencies:
31+
32+
- **cmake**: Cross-platform build system generator that we'll use to compile Llama.cpp
33+
- **gcc & g++**: GNU C and C++ compilers for building native code
34+
- **git**: Version control system for cloning repositories
35+
- **python3**: Python interpreter for running Python-based tools and scripts
36+
- **python3-pip**: Python package installer for managing Python dependencies
37+
- **python3-virtualenv**: Tool for creating isolated Python environments
38+
- **libcurl4-openssl-dev**: client-side URL transfer library
39+
40+
The `-y` flag automatically answers "yes" to prompts, making the installation non-interactive.
41+
42+
## What's Ready Now
43+
44+
After completing these steps, your Graviton4 instance will have:
45+
46+
- A complete C/C++ development environment for building Llama.cpp
47+
- Python 3 with pip for managing Python packages
48+
- Git for cloning repositories
49+
- All necessary build tools for compiling optimized ARM64 binaries
50+
51+
The system is now prepared for the next steps: building Llama.cpp and downloading the Arcee Foundation Model.
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
---
2+
title: Building Llama.cpp
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
In this step, we'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model that's optimized for inference on various hardware platforms, including ARM-based processors like Graviton4.
10+
11+
Even though AFM-4.5B has a custom model architecture, we're able to use the vanilla version of llama.cpp as the Arcee AI team has contributed the appropriate modeling code.
12+
13+
Here are all the steps.
14+
15+
## Step 1: Clone the Repository
16+
17+
```bash
18+
git clone https://github.com/ggerganov/llama.cpp
19+
```
20+
21+
This command clones the Llama.cpp repository from GitHub to your local machine. The repository contains the source code, build scripts, and documentation needed to compile the inference engine.
22+
23+
## Step 2: Navigate to the Project Directory
24+
25+
```bash
26+
cd llama.cpp
27+
```
28+
29+
Change into the llama.cpp directory where we'll perform the build process. This directory contains the CMakeLists.txt file and source code structure.
30+
31+
## Step 3: Configure the Build with CMake
32+
33+
```bash
34+
cmake -B .
35+
```
36+
37+
This command uses CMake to configure the build system:
38+
- `-B .` specifies that the build files should be generated in the current directory
39+
- CMake will detect your system's compiler, libraries, and hardware capabilities
40+
- It will generate the appropriate build files (Makefiles on Linux) based on your system configuration
41+
42+
Note: The cmake output should include the information below, indicating that the build process will leverage the Neoverse V2 architecture's specialized instruction sets designed for AI/ML workloads. These optimizations are crucial for achieving optimal performance on Graviton4:
43+
44+
```bash
45+
-- ARM feature DOTPROD enabled
46+
-- ARM feature SVE enabled
47+
-- ARM feature MATMUL_INT8 enabled
48+
-- ARM feature FMA enabled
49+
-- ARM feature FP16_VECTOR_ARITHMETIC enabled
50+
-- Adding CPU backend variant ggml-cpu: -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+dotprod+i8mm+sve
51+
```
52+
53+
- **DOTPROD: Dot Product** - Hardware-accelerated dot product operations for neural network computations
54+
- **SVE: Scalable Vector Extension** - Advanced vector processing capabilities that can handle variable-length vectors up to 2048 bits, providing significant performance improvements for matrix operations
55+
- **MATMUL_INT8: Matrix multiplication units** - Dedicated hardware for efficient matrix operations common in transformer models, accelerating the core computations of large language models
56+
- **FMA: Fused Multiply-Add - Optimized floating-point operations that combine multiplication and addition in a single instruction
57+
- **FP16 Vector Arithmetic - Hardware support for 16-bit floating-point vector operations, reducing memory usage while maintaining good numerical precision
58+
59+
## Step 4: Compile the Project
60+
61+
```bash
62+
cmake --build . --config Release -j16
63+
```
64+
65+
This command compiles the Llama.cpp project:
66+
- `--build .` tells CMake to build the project using the files in the current directory
67+
- `--config Release` specifies a Release build configuration, which enables optimizations and removes debug symbols
68+
- `-j16` runs the build with 16 parallel jobs, which speeds up compilation on multi-core systems like Graviton4
69+
70+
The build process will compile the C++ source code into executable binaries optimized for your ARM64 architecture. This should only take a minute.
71+
72+
## What Gets Built
73+
74+
After successful compilation, you'll have several key command-line executables in the `bin` directory:
75+
- `llama-cli` - The main inference executable for running LLaMA models
76+
- `llama-server` - A web server for serving model inference over HTTP
77+
- `llama-quantize` - a tool for model quantization to reduce memory usage
78+
- Various utility programs for model conversion and optimization
79+
80+
You can find more information in the llama.cpp [GitHub repository](https://github.com/ggml-org/llama.cpp/tree/master/tools).
81+
82+
These binaries are specifically optimized for ARM64 architecture and will provide excellent performance on your Graviton4 instance.
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
---
2+
title: Installing Python dependencies for llama.cpp
3+
weight: 5
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
In this step, we'll set up a Python virtual environment and install the required dependencies for working with Llama.cpp. This ensures we have a clean, isolated Python environment with all the necessary packages for model optimization.
10+
11+
Here are all the steps.
12+
13+
## Step 1: Create a Python Virtual Environment
14+
15+
```bash
16+
virtualenv env-llama-cpp
17+
```
18+
19+
This command creates a new Python virtual environment named `env-llama-cpp`:
20+
- Virtual environments provide isolated Python environments that prevent conflicts between different projects
21+
- The `env-llama-cpp` directory will contain its own Python interpreter and package installation space
22+
- This isolation ensures that the Llama.cpp dependencies won't interfere with other Python projects on your system
23+
- Virtual environments are essential for reproducible development environments
24+
25+
## Step 2: Activate the Virtual Environment
26+
27+
```bash
28+
source env-llama-cpp/bin/activate
29+
```
30+
31+
This command activates the virtual environment:
32+
- The `source` command executes the activation script, which modifies your current shell environment
33+
- Depending on you sheel, your command prompt may change to show `(env-llama-cpp)` at the beginning, indicating the active environment. We will reflect this in the following commands.
34+
- All subsequent `pip` commands will install packages into this isolated environment
35+
- The `PATH` environment variable is updated to prioritize the virtual environment's Python interpreter
36+
37+
## Step 3: Upgrade pip to the Latest Version
38+
39+
```bash
40+
(env-llama-cpp) pip install --upgrade pip
41+
```
42+
43+
This command ensures you have the latest version of pip:
44+
- Upgrading pip helps avoid compatibility issues with newer packages
45+
- The `--upgrade` flag tells pip to install the newest available version
46+
- This is a best practice before installing project dependencies
47+
- Newer pip versions often include security fixes and improved package resolution
48+
49+
## Step 4: Install Project Dependencies
50+
51+
```bash
52+
(env-llama-cpp) pip install -r requirements.txt
53+
```
54+
55+
This command installs all the Python packages specified in the requirements.txt file:
56+
- The `-r` flag tells pip to read the package list from the specified file
57+
- `requirements.txt` contains a list of Python packages and their version specifications
58+
- This ensures everyone working on the project uses the same package versions
59+
- The installation will include packages needed for model loading, inference, and any Python bindings for Llama.cpp
60+
61+
## What Gets Installed
62+
63+
After successful installation, your virtual environment will contain:
64+
- **NumPy**: For numerical computations and array operations
65+
- **Requests**: For HTTP operations and API calls
66+
- **Other dependencies**: Specific packages needed for Llama.cpp Python integration
67+
68+
The virtual environment is now ready for running Python scripts that interact with the compiled Llama.cpp binaries. Remember to always activate the virtual environment (`source env-llama-cpp/bin/activate`) before running any Python code related to this project.

0 commit comments

Comments
 (0)