Skip to content

Commit f0eecf9

Browse files
Merge pull request #2199 from madeline-underwood/Arcee_final
Arcee final edits
2 parents 4751a7f + 3391580 commit f0eecf9

File tree

4 files changed

+11
-9
lines changed

4 files changed

+11
-9
lines changed

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/02_setting_up_the_instance.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ weight: 4
66
layout: learningpathall
77
---
88

9-
In this step, you'll set up the Graviton4 instance with the tools and dependencies required to build and run the Arcee Foundation Model. This includes installing system packages and a Python environment.
9+
In this step, you'll set up the Graviton4 instance with the tools and dependencies required to build and run the AFM-4.5B model. This includes installing system packages and a Python environment.
1010

1111
## Update the package list
1212

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ layout: learningpathall
77
---
88
## Build the Llama.cpp inference engine
99

10-
In this step, you'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model, optimized for inference on a range of hardware platforms,including Arm-based processors like AWS Graviton4.
10+
In this step, you'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model, optimized for inference on a range of hardware platforms, including Arm-based processors like AWS Graviton4.
1111

1212
Even though AFM-4.5B uses a custom model architecture, you can still use the standard Llama.cpp repository - Arcee AI has contributed the necessary modeling code upstream.
1313

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/04_install_python_dependencies_for_llama_cpp.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ This command does the following:
3232

3333
- Runs the activation script, which modifies your shell environment
3434
- Updates your shell prompt to show `env-llama-cpp`, indicating the environment is active
35-
- Updates `PATH` to use so the environment’s Python interpreter
35+
- Updates `PATH` to use the environment’s Python interpreter
3636
- Ensures all `pip` commands install packages into the isolated environment
3737

3838
## Upgrade pip to the latest version
@@ -72,7 +72,8 @@ After the installation completes, your virtual environment includes:
7272
- **NumPy**: for numerical computations and array operations
7373
- **Requests**: for HTTP operations and API calls
7474
- **Other dependencies**: additional packages required by llama.cpp's Python bindings and utilities
75-
Your environment is now ready to run Python scripts that integrate with the compiled Llama.cpp binaries
75+
76+
Your environment is now ready to run Python scripts that integrate with the compiled Llama.cpp binaries.
7677

7778
{{< notice Tip >}}
7879
Before running any Python commands, make sure your virtual environment is activated. {{< /notice >}}

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/05_downloading_and_optimizing_afm45b.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ layout: learningpathall
88

99
In this step, you’ll download the [AFM-4.5B](https://huggingface.co/arcee-ai/AFM-4.5B) model from Hugging Face, convert it to the GGUF format for compatibility with `llama.cpp`, and generate quantized versions to optimize memory usage and improve inference speed.
1010

11-
**Note: if you want to skip the model optimization process, [GGUF](https://huggingface.co/arcee-ai/AFM-4.5B-GGUF) versions are available.**
11+
{{% notice Note %}}
12+
If you want to skip the model optimization process, [GGUF](https://huggingface.co/arcee-ai/AFM-4.5B-GGUF) versions are available. {{% /notice %}}
1213

1314
Make sure to activate your virtual environment before running any commands. The instructions below walk you through downloading and preparing the model for efficient use on AWS Graviton4.
1415

@@ -28,11 +29,11 @@ pip install huggingface_hub hf_xet
2829
This command installs:
2930

3031
- `huggingface_hub`: Python client for downloading models and datasets
31-
- `hf_xet`: Git extension for fetching large model files stored on Hugging Face
32+
- `hf_xet`: Git extension for fetching large model files hosted on Hugging Face
3233

3334
These tools include the `hf` command-line interface you'll use next.
3435

35-
## Login to the Hugging Face Hub
36+
## Log in to the Hugging Face Hub
3637

3738
```bash
3839
hf auth login
@@ -86,7 +87,7 @@ This command creates a 4-bit quantized version of the model:
8687
- `llama-quantize` is the quantization tool from Llama.cpp.
8788
- `afm-4-5B-F16.gguf` is the input GGUF model file in 16-bit precision.
8889
- `Q4_0` applies zero-point 4-bit quantization.
89-
- This reduces the model size by approximately 45% (from ~15GB to ~8GB).
90+
- This reduces the model size by approximately ~70% (from ~15GB to ~4.4GB).
9091
- The quantized model will use less memory and run faster, though with a small reduction in accuracy.
9192
- The output file will be `afm-4-5B-Q4_0.gguf`.
9293

@@ -104,7 +105,7 @@ bin/llama-quantize models/afm-4-5b/afm-4-5B-F16.gguf models/afm-4-5b/afm-4-5B-Q8
104105

105106
This command creates an 8-bit quantized version of the model:
106107
- `Q8_0` specifies 8-bit quantization with zero-point compression.
107-
- This reduces the model size by approximately 70% (from ~15GB to ~4.4GB).
108+
- This reduces the model size by approximately ~45% (from ~15GB to ~8GB).
108109
- The 8-bit version provides a better balance between memory usage and accuracy than 4-bit quantization.
109110
- The output file is named `afm-4-5B-Q8_0.gguf`.
110111
- Commonly used in production scenarios where memory resources are available.

0 commit comments

Comments
 (0)