Merge pull request #2199 from madeline-underwood/Arcee_final

jasonrandrews · web-flow · commit f0eecf9d1362 · 2025-07-31T13:53:19.000-05:00
Arcee final edits
diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/02_setting_up_the_instance.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/02_setting_up_the_instance.md
@@ -6,7 +6,7 @@ weight: 4
 layout: learningpathall
 ---
 
-In this step, you'll set up the Graviton4 instance with the tools and dependencies required to build and run the Arcee Foundation Model. This includes installing system packages and a Python environment.
+In this step, you'll set up the Graviton4 instance with the tools and dependencies required to build and run the AFM-4.5B model. This includes installing system packages and a Python environment.
 
 ## Update the package list
 
diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md
@@ -7,7 +7,7 @@ layout: learningpathall
 ---
 ## Build the Llama.cpp inference engine
 
-In this step, you'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model, optimized for inference on a range of hardware platforms,including Arm-based processors like AWS Graviton4.
+In this step, you'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model, optimized for inference on a range of hardware platforms, including Arm-based processors like AWS Graviton4.
 
 Even though AFM-4.5B uses a custom model architecture, you can still use the standard Llama.cpp repository - Arcee AI has contributed the necessary modeling code upstream.
 
diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/04_install_python_dependencies_for_llama_cpp.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/04_install_python_dependencies_for_llama_cpp.md
@@ -32,7 +32,7 @@ This command does the following:
 
 - Runs the activation script, which modifies your shell environment
 - Updates your shell prompt to show `env-llama-cpp`, indicating the environment is active
-- Updates `PATH` to use so the environment’s Python interpreter 
+- Updates `PATH` to use the environment’s Python interpreter 
 - Ensures all `pip` commands install packages into the isolated environment
 
 ## Upgrade pip to the latest version
@@ -72,7 +72,8 @@ After the installation completes, your virtual environment includes:
 - **NumPy**: for numerical computations and array operations
 - **Requests**: for HTTP operations and API calls
 - **Other dependencies**: additional packages required by llama.cpp's Python bindings and utilities
-Your environment is now ready to run Python scripts that integrate with the compiled Llama.cpp binaries
+  
+Your environment is now ready to run Python scripts that integrate with the compiled Llama.cpp binaries.
 
 {{< notice Tip >}}
 Before running any Python commands, make sure your virtual environment is activated. {{< /notice >}}
diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/05_downloading_and_optimizing_afm45b.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/05_downloading_and_optimizing_afm45b.md
@@ -8,7 +8,8 @@ layout: learningpathall
 
 In this step, you’ll download the [AFM-4.5B](https://huggingface.co/arcee-ai/AFM-4.5B) model from Hugging Face, convert it to the GGUF format for compatibility with `llama.cpp`, and generate quantized versions to optimize memory usage and improve inference speed.
 
-**Note: if you want to skip the model optimization process, [GGUF](https://huggingface.co/arcee-ai/AFM-4.5B-GGUF) versions are available.**
+{{% notice Note %}}
+If you want to skip the model optimization process, [GGUF](https://huggingface.co/arcee-ai/AFM-4.5B-GGUF) versions are available. {{% /notice %}}
 
 Make sure to activate your virtual environment before running any commands. The instructions below walk you through downloading and preparing the model for efficient use on AWS Graviton4.
 
@@ -28,11 +29,11 @@ pip install huggingface_hub hf_xet
 This command installs:
 
 - `huggingface_hub`: Python client for downloading models and datasets
-- `hf_xet`: Git extension for fetching large model files stored on Hugging Face
+- `hf_xet`: Git extension for fetching large model files hosted on Hugging Face
 
 These tools include the `hf` command-line interface you'll use next.
 
-## Login to the Hugging Face Hub
+## Log in to the Hugging Face Hub
 
 ```bash
 hf auth login
@@ -86,7 +87,7 @@ This command creates a 4-bit quantized version of the model:
 - `llama-quantize` is the quantization tool from Llama.cpp.
 - `afm-4-5B-F16.gguf` is the input GGUF model file in 16-bit precision. 
 - `Q4_0` applies zero-point 4-bit quantization.
-- This reduces the model size by approximately 45% (from ~15GB to ~8GB).
+- This reduces the model size by approximately ~70% (from ~15GB to ~4.4GB).
 - The quantized model will use less memory and run faster, though with a small reduction in accuracy.
 - The output file will be `afm-4-5B-Q4_0.gguf`.
 
@@ -104,7 +105,7 @@ bin/llama-quantize models/afm-4-5b/afm-4-5B-F16.gguf models/afm-4-5b/afm-4-5B-Q8
 
 This command creates an 8-bit quantized version of the model:
 - `Q8_0` specifies 8-bit quantization with zero-point compression.
-- This reduces the model size by approximately 70% (from ~15GB to ~4.4GB).
+- This reduces the model size by approximately ~45% (from ~15GB to ~8GB).
 - The 8-bit version provides a better balance between memory usage and accuracy than 4-bit quantization.
 - The output file is named `afm-4-5B-Q8_0.gguf`.
 - Commonly used in production scenarios where memory resources are available.