Skip to content

Commit 21d315a

Browse files
authored
fix (#174)
1 parent 33bebb2 commit 21d315a

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

blog/2025-07-25-spec-forge.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ Using SpecForge, we trained the Llama 4 Scout and Maverick models on a 320K-samp
5757

5858
We evaluated various draft token lengths for Scout and Maverick.
5959

60-
In all the tests shown in the figure below, the x-axis represents steps, corresponding to `speculative-num-steps` in SGLang. Meanwhile, we fixed SGLang's `speculative-eagle-topk` to 8 and `speculative-num-draft-tokens` to 10 to ensure that `tree attention` can be enabled. To find the optimal speculative decoding parameters, we can use the `[bench_speculative](https://github.com/sgl-project/sglang/blob/main/scripts/playground/bench_speculative.py)` script in the SGLang repository. It runs throughput benchmarks across different configurations and helps us tune for the best performance on the hardware.
60+
In all the tests shown in the figure below, the x-axis represents steps, corresponding to `speculative-num-steps` in SGLang. Meanwhile, we fixed SGLang's `speculative-eagle-topk` to 8 and `speculative-num-draft-tokens` to 10 to ensure that `tree attention` can be enabled. To find the optimal speculative decoding parameters, we can use the **[bench_speculative](https://github.com/sgl-project/sglang/blob/main/scripts/playground/bench_speculative.py)** script in the SGLang repository. It runs throughput benchmarks across different configurations and helps us tune for the best performance on the hardware.
6161

6262
![scout.svg](/images/blog/spec_forge/Llama4_Scout_performance_final.svg)
6363

0 commit comments

Comments
 (0)