Skip to content

Commit 8176c35

Browse files
committed
small fix
1 parent 8443489 commit 8176c35

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

tools/benchmarks/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Benchmarks
22

33
* inference - a folder contains benchmark scripts that apply a throughput analysis for Llama models inference on various backends including on-prem, cloud and on-device.
4-
* llm_eval_harness - a folder that introduces `lm-evaluation-harness`, a tool to evaluate Llama models including quantized models focusing on quality. We also included a recipe that reproduces Meta 3.1 evaluation metrics Using `lm-evaluation-harness` and instructions that reproduce HuggingFace Open LLM Leaderboard v2 metrics.
4+
* llm_eval_harness - a folder that introduces `lm-evaluation-harness`, a tool to evaluate Llama models including quantized models focusing on quality. We also included a recipe that calculates Llama 3.1 evaluation metrics Using `lm-evaluation-harness` and instructions that calculate HuggingFace Open LLM Leaderboard v2 metrics.

tools/benchmarks/llm_eval_harness/meta_eval/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ As Llama models gain popularity, evaluating these models has become increasingly
66
## Disclaimer
77

88

9-
1. **This recipe is not the official implementation** of Llama evaluation. Since our internal eval repo isn't public, we want to provide this recipe as an aid for anyone who want to use the datasets we released. It is based on public third-party libraries, as this implementation is not mirroring Llama evaluation, therefore this may lead to minor differences in the produced numbers.
9+
1. **This recipe is not the official implementation** of Llama evaluation. Since our internal eval repo isn't public, we want to provide this recipe as an aid for anyone who wants to use the datasets we released. It is based on public third-party libraries, as this implementation is not mirroring Llama evaluation, therefore this may lead to minor differences in the produced numbers.
1010
2. **Model Compatibility**: This tutorial is specifically for Llama 3 based models, as our prompts include Llama 3 special tokens, e.g. `<|start_header_id|>user<|end_header_id|>`. It will not work with models that are not based on Llama 3.
1111

1212
## Insights from Our Evaluation Process

0 commit comments

Comments
 (0)