Skip to content

Commit 9233f5c

Browse files
committed
charlotte edits
Signed-off-by: simon-mo <[email protected]>
1 parent 3ca20af commit 9233f5c

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

_posts/2025-04-05-llama4.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ vllm serve meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 \
6060

6161
**Performance:**
6262

63-
With the configurations above, we observe the following output tokens/s. Note that Scout is smaller but runnning with bfloat 16 while Maverick is running with fp8.
63+
With the configurations above, we observe the following output tokens/s for Scout-BF16 and Maverick-FP8:
6464

6565
![](/assets/figures/llama4/perf.png)
6666

@@ -75,7 +75,7 @@ While more performance enhancements are on the way, we believe the Llama 4 model
7575
**Other Hardware Support & Quantizations:**
7676

7777
* A100: We have verified that the bf16 versions of the models work well on A100 GPUs.
78-
* INT4: An INT4-quantized version of the Scout model checkpoint is currently a work in progress. Stay tuned for updates.
78+
* INT4: An INT4-quantized version of the Scout model checkpoint that fits on a single H100 GPUis currently a work in progress. Stay tuned for updates.
7979
* AMD MI300X: You can run Llama 4 on AMD MI300X GPUs by building [vLLM from source](https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html?device=rocm) and using the same commands as above.
8080

8181
**Inference Accuracy Validation:**
@@ -85,7 +85,7 @@ We validated inference accuracy against the official Meta report using lm-eval-h
8585
|----------|---------|---------|
8686
| Reported | 80.5 | 90 |
8787
| H100 FP8 | 80.4 | 89.4 |
88-
| AMD BF16 | 80.4 | 89.4 |
88+
| AMD MI300x BF16 | 80.4 | 89.4 |
8989
| H200 BF16 | 80.2 | 89.3 |
9090

9191
## Efficient Architecture and Cluster Scale Serving

assets/figures/llama4/perf.png

38.2 KB
Loading

0 commit comments

Comments
 (0)