odidev
diff --git a/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/01_launching_a_graviton4_instance.md‎
Lines changed: 3 additions & 1 deletion b/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/01_launching_a_graviton4_instance.md‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md‎
Lines changed: 1 addition & 2 deletions b/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/07_evaluating_the_quantized_models.md‎
Lines changed: 22 additions & 5 deletions b/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/07_evaluating_the_quantized_models.md‎
Lines changed: 22 additions & 5 deletions
diff --git a/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/08_conclusion.md‎
Lines changed: 5 additions & 1 deletion b/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/08_conclusion.md‎
Lines changed: 5 additions & 1 deletion
diff --git a/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/01_launching_an axion_instance.md‎
Lines changed: 0 additions & 102 deletions b/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/01_launching_an axion_instance.md‎
Lines changed: 0 additions & 102 deletions
diff --git a/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/02_setting_up_the_instance.md‎
Lines changed: 0 additions & 51 deletions b/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/02_setting_up_the_instance.md‎
Lines changed: 0 additions & 51 deletions
diff --git a/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/03_building_llama_cpp.md‎
Lines changed: 0 additions & 82 deletions b/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/03_building_llama_cpp.md‎
Lines changed: 0 additions & 82 deletions
diff --git a/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/04_install_python_dependencies_for_llama_cpp.md‎
Lines changed: 0 additions & 68 deletions b/‎content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/04_install_python_dependencies_for_llama_cpp.md‎
Lines changed: 0 additions & 68 deletions
@@ -10,9 +10,11 @@ layout: learningpathall
 
   - An AWS account
 
+  - Quota for c8g instances in your preferred region
+
   - A Linux or MacOS host
 
-  - A c8g or r8g instance (4xlarge or larger)
+  - A c8g instance (4xlarge or larger)
 
   - At least 128GB of storage
 
 
@@ -38,7 +38,6 @@ This command uses CMake to configure the build system:
 - `-B .` specifies that the build files should be generated in the current directory
 - CMake will detect your system's compiler, libraries, and hardware capabilities
 - It will generate the appropriate build files (Makefiles on Linux) based on your system configuration
-- This step also enables optimizations for ARM processors like Graviton4
 
 Note: The cmake output should include the information below, indicating that the build process will leverage the Neoverse V2 architecture's specialized instruction sets designed for AI/ML workloads. These optimizations are crucial for achieving optimal performance on Graviton4:
 
@@ -80,4 +79,4 @@ After successful compilation, you'll have several key command-line executables i
 
 You can find more information in the llama.cpp [GitHub repository](https://github.com/ggml-org/llama.cpp/tree/master/tools).
 
-These binaries are specifically optimized for ARM64 architecture and will provide excellent performance on your Graviton4 instance.
+These binaries are specifically optimized for ARM64 architecture and will provide excellent performance on your Graviton4 instance.
@@ -61,7 +61,7 @@ The results should look like this:
 | llama 8B Q4_0                  |   4.33 GiB |     8.03 B | CPU        |      16 |           pp512 |        190.18 ± 0.03 |
 | llama 8B Q4_0                  |   4.33 GiB |     8.03 B | CPU        |      16 |           tg128 |         40.99 ± 0.36 |
 
-It's pretty amazing to see that with only 4 threads, the 4-bit model can still generate at the very comfortable speed of 15 tokens per second.
+It's pretty amazing to see that with only 4 threads, the 4-bit model can still generate at the very comfortable speed of 15 tokens per second. We could definitely run several copies of the model on the same instance to serve concurrent users or applications.
 
 You could also try [`llama-batched-bench`](https://github.com/ggml-org/llama.cpp/tree/master/tools/batched-bench) to benchmark performance on batch sizes larger than 1.
 
@@ -89,16 +89,33 @@ bin/llama-perplexity -m models/afm-4-5b/afm-4-5B-Q8_0.gguf -f wikitext-2-raw/wik
 bin/llama-perplexity -m models/afm-4-5b/afm-4-5B-Q4_0.gguf -f wikitext-2-raw/wiki.test.raw
 ```
 
-These commands will run for about 4 hours. You should run them in a shell script to avoid SSH timeouts. For example:
+If you want to speed things up, you can add the `--chunks` option to use a fraction of 564 chunks contained in the test dataset.
+
+On the full dataset, these three commands will take about 5 hours. You should run them in a shell script to avoid SSH timeouts.
+
+For example:
+```bash
+#!/bin/bash
+# ppl.sh
+bin/llama-perplexity -m models/afm-4-5b/afm-4-5B-F16.gguf -f wikitext-2-raw/wiki.test.raw
+bin/llama-perplexity -m models/afm-4-5b/afm-4-5B-Q8_0.gguf -f wikitext-2-raw/wiki.test.raw
+bin/llama-perplexity -m models/afm-4-5b/afm-4-5B-Q4_0.gguf -f wikitext-2-raw/wiki.test.raw
+```
 ```bash
  nohup sh ppl.sh >& ppl.sh.log &
  tail -f ppl.sh.log
  ```
 
-If you want to speed things up, you can add the `--chunks` option to use a fraction of 564 chunks contained in the test dataset.
 
-Here are the full results:
+Here are the full results.
+
+
+| Model | Generation Speed (tokens/s, 16 vCPUs) | Memory Usage | Perplexity (Wikitext-2) |
+|:-------:|:----------------------:|:------------:|:----------:|
+| F16     | ~15–16                 | ~15 GB       | TODO     |
+| Q8_0    | ~25                    | ~8 GB        | TODO       |
+| Q4_0    | ~40                    | ~4.4 GB      | TODO       |
 
-TODO
 
+*Please remember to terminate the instance in the AWS console when you're done testing*
 
@@ -9,7 +9,11 @@ layout: learningpathall
 
 ## Conclusion
 
-Congratulations! You have successfully completed the journey of deploying the Arcee AFM-4.5B foundation model on AWS Graviton4. Let's recap what we accomplished:
+Congratulations! You have successfully completed the journey of deploying the Arcee AFM-4.5B foundation model on AWS Graviton4. 
+
+*Please remember to terminate the instance in the AWS console when you're done testing*
+
+Let's recap what we accomplished.
 
 ### What We Built