Skip to content

Commit ec8cc77

Browse files
committed
- Clean up
- Remove GCP learning for now
1 parent 3e341ba commit ec8cc77

File tree

10 files changed

+31
-398
lines changed

10 files changed

+31
-398
lines changed

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/01_launching_a_graviton4_instance.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,11 @@ layout: learningpathall
1010

1111
- An AWS account
1212

13+
- Quota for c8g instances in your preferred region
14+
1315
- A Linux or MacOS host
1416

15-
- A c8g or r8g instance (4xlarge or larger)
17+
- A c8g instance (4xlarge or larger)
1618

1719
- At least 128GB of storage
1820

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,6 @@ This command uses CMake to configure the build system:
3838
- `-B .` specifies that the build files should be generated in the current directory
3939
- CMake will detect your system's compiler, libraries, and hardware capabilities
4040
- It will generate the appropriate build files (Makefiles on Linux) based on your system configuration
41-
- This step also enables optimizations for ARM processors like Graviton4
4241

4342
Note: The cmake output should include the information below, indicating that the build process will leverage the Neoverse V2 architecture's specialized instruction sets designed for AI/ML workloads. These optimizations are crucial for achieving optimal performance on Graviton4:
4443

@@ -80,4 +79,4 @@ After successful compilation, you'll have several key command-line executables i
8079

8180
You can find more information in the llama.cpp [GitHub repository](https://github.com/ggml-org/llama.cpp/tree/master/tools).
8281

83-
These binaries are specifically optimized for ARM64 architecture and will provide excellent performance on your Graviton4 instance.
82+
These binaries are specifically optimized for ARM64 architecture and will provide excellent performance on your Graviton4 instance.

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/07_evaluating_the_quantized_models.md

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ The results should look like this:
6161
| llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 16 | pp512 | 190.18 ± 0.03 |
6262
| llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 16 | tg128 | 40.99 ± 0.36 |
6363

64-
It's pretty amazing to see that with only 4 threads, the 4-bit model can still generate at the very comfortable speed of 15 tokens per second.
64+
It's pretty amazing to see that with only 4 threads, the 4-bit model can still generate at the very comfortable speed of 15 tokens per second. We could definitely run several copies of the model on the same instance to serve concurrent users or applications.
6565

6666
You could also try [`llama-batched-bench`](https://github.com/ggml-org/llama.cpp/tree/master/tools/batched-bench) to benchmark performance on batch sizes larger than 1.
6767

@@ -89,16 +89,33 @@ bin/llama-perplexity -m models/afm-4-5b/afm-4-5B-Q8_0.gguf -f wikitext-2-raw/wik
8989
bin/llama-perplexity -m models/afm-4-5b/afm-4-5B-Q4_0.gguf -f wikitext-2-raw/wiki.test.raw
9090
```
9191

92-
These commands will run for about 4 hours. You should run them in a shell script to avoid SSH timeouts. For example:
92+
If you want to speed things up, you can add the `--chunks` option to use a fraction of 564 chunks contained in the test dataset.
93+
94+
On the full dataset, these three commands will take about 5 hours. You should run them in a shell script to avoid SSH timeouts.
95+
96+
For example:
97+
```bash
98+
#!/bin/bash
99+
# ppl.sh
100+
bin/llama-perplexity -m models/afm-4-5b/afm-4-5B-F16.gguf -f wikitext-2-raw/wiki.test.raw
101+
bin/llama-perplexity -m models/afm-4-5b/afm-4-5B-Q8_0.gguf -f wikitext-2-raw/wiki.test.raw
102+
bin/llama-perplexity -m models/afm-4-5b/afm-4-5B-Q4_0.gguf -f wikitext-2-raw/wiki.test.raw
103+
```
93104
```bash
94105
nohup sh ppl.sh >& ppl.sh.log &
95106
tail -f ppl.sh.log
96107
```
97108

98-
If you want to speed things up, you can add the `--chunks` option to use a fraction of 564 chunks contained in the test dataset.
99109

100-
Here are the full results:
110+
Here are the full results.
111+
112+
113+
| Model | Generation Speed (tokens/s, 16 vCPUs) | Memory Usage | Perplexity (Wikitext-2) |
114+
|:-------:|:----------------------:|:------------:|:----------:|
115+
| F16 | ~15–16 | ~15 GB | TODO |
116+
| Q8_0 | ~25 | ~8 GB | TODO |
117+
| Q4_0 | ~40 | ~4.4 GB | TODO |
101118

102-
TODO
103119

120+
*Please remember to terminate the instance in the AWS console when you're done testing*
104121

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/08_conclusion.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,11 @@ layout: learningpathall
99

1010
## Conclusion
1111

12-
Congratulations! You have successfully completed the journey of deploying the Arcee AFM-4.5B foundation model on AWS Graviton4. Let's recap what we accomplished:
12+
Congratulations! You have successfully completed the journey of deploying the Arcee AFM-4.5B foundation model on AWS Graviton4.
13+
14+
*Please remember to terminate the instance in the AWS console when you're done testing*
15+
16+
Let's recap what we accomplished.
1317

1418
### What We Built
1519

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/01_launching_an axion_instance.md

Lines changed: 0 additions & 102 deletions
This file was deleted.

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/02_setting_up_the_instance.md

Lines changed: 0 additions & 51 deletions
This file was deleted.

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/03_building_llama_cpp.md

Lines changed: 0 additions & 82 deletions
This file was deleted.

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/04_install_python_dependencies_for_llama_cpp.md

Lines changed: 0 additions & 68 deletions
This file was deleted.

0 commit comments

Comments
 (0)