You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,17 +1,18 @@
1
1
# Llama Recipes: Examples to get started using the Llama models from Meta
2
2
<!-- markdown-link-check-disable -->
3
-
The 'llama-recipes' repository is a companion to the [Meta Llama 2](https://github.com/meta-llama/llama) and [Meta Llama 3](https://github.com/meta-llama/llama3) models. The goal of this repository is to provide a scalable library for fine-tuning Meta Llama models, along with some example scripts and notebooks to quickly get started with using the models in a variety of use-cases, including fine-tuning for domain adaptation and building LLM-based applications with Meta Llama and other tools in the LLM ecosystem. The examples here showcase how to run Meta Llama locally, in the cloud, and on-prem.
3
+
The 'llama-recipes' repository is a companion to the [Meta Llama 3](https://github.com/meta-llama/llama3) models. The goal of this repository is to provide a scalable library for fine-tuning Meta Llama models, along with some example scripts and notebooks to quickly get started with using the models in a variety of use-cases, including fine-tuning for domain adaptation and building LLM-based applications with Meta Llama and other tools in the LLM ecosystem. The examples here showcase how to run Meta Llama locally, in the cloud, and on-prem. [Meta Llama 2](https://github.com/meta-llama/llama) is also supported in this repository. We highly recommend everyone to utilize [Meta Llama 3](https://github.com/meta-llama/llama3) due to its enhanced capabilities.
4
+
4
5
<!-- markdown-link-check-enable -->
5
6
> [!IMPORTANT]
6
-
> Llama 3 has a new prompt template and special tokens (based on the tiktoken tokenizer).
7
+
> Meta Llama 3 has a new prompt template and special tokens (based on the tiktoken tokenizer).
7
8
> | Token | Description |
8
9
> |---|---|
9
10
> `<\|begin_of_text\|>` | This is equivalent to the BOS token. |
10
11
> `<\|end_of_text\|>` | This is equivalent to the EOS token. For multiturn-conversations it's usually unused. Instead, every message is terminated with `<\|eot_id\|>` instead.|
11
12
> `<\|eot_id\|>` | This token signifies the end of the message in a turn i.e. the end of a single message by a system, user or assistant role as shown below.|
12
13
> `<\|start_header_id\|>{role}<\|end_header_id\|>` | These tokens enclose the role for a particular message. The possible roles can be: system, user, assistant. |
13
14
>
14
-
> A multiturn-conversation with Llama 3 follows this prompt template:
15
+
> A multiturn-conversation with Meta Llama 3 follows this prompt template:
@@ -133,7 +134,7 @@ Contains examples are organized in folders by topic:
133
134
[quickstart](./recipes/quickstart) | The "Hello World" of using Llama, start here if you are new to using Llama.
134
135
[finetuning](./recipes/finetuning)|Scripts to finetune Llama on single-GPU and multi-GPU setups
135
136
[inference](./recipes/inference)|Scripts to deploy Llama for inference locally and using model servers
136
-
[use_cases](./recipes/use_cases)|Scripts showing common applications of Llama2
137
+
[use_cases](./recipes/use_cases)|Scripts showing common applications of Meta Llama3
137
138
[responsible_ai](./recipes/responsible_ai)|Scripts to use PurpleLlama for safeguarding model outputs
138
139
[llama_api_providers](./recipes/llama_api_providers)|Scripts to run inference on Llama via hosted endpoints
139
140
[benchmarks](./recipes/benchmarks)|Scripts to benchmark Llama models inference on various backends
@@ -159,7 +160,8 @@ Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduc
159
160
160
161
## License
161
162
<!-- markdown-link-check-disable -->
162
-
See the License file for Meta Llama 2 [here](https://llama.meta.com/llama2/license/) and Acceptable Use Policy [here](https://llama.meta.com/llama2/use-policy/)
163
163
164
164
See the License file for Meta Llama 3 [here](https://llama.meta.com/llama3/license/) and Acceptable Use Policy [here](https://llama.meta.com/llama3/use-policy/)
165
+
166
+
See the License file for Meta Llama 2 [here](https://llama.meta.com/llama2/license/) and Acceptable Use Policy [here](https://llama.meta.com/llama2/use-policy/)
Copy file name to clipboardExpand all lines: recipes/evaluation/README.md
+13-13Lines changed: 13 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Llama Model Evaluation
2
2
3
-
Llama-Recipe make use of `lm-evaluation-harness` for evaluating our fine-tuned Llama2 model. It also can serve as a tool to evaluate quantized model to ensure the quality in lower precision or other optimization applied to the model that might need evaluation.
3
+
Llama-Recipe make use of `lm-evaluation-harness` for evaluating our fine-tuned Meta Llama3 (or Llama2) model. It also can serve as a tool to evaluate quantized model to ensure the quality in lower precision or other optimization applied to the model that might need evaluation.
4
4
5
5
6
6
`lm-evaluation-harness` provide a wide range of [features](https://github.com/EleutherAI/lm-evaluation-harness?tab=readme-ov-file#overview):
@@ -12,7 +12,7 @@ Llama-Recipe make use of `lm-evaluation-harness` for evaluating our fine-tuned L
12
12
- Support for evaluation on adapters (e.g. LoRA) supported in Hugging Face's PEFT library.
13
13
- Support for local models and benchmarks.
14
14
15
-
The Language Model Evaluation Harness is also the backend for 🤗 [Hugging Face's (HF) popular Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
15
+
The Language Model Evaluation Harness is also the backend for 🤗 [Hugging Face's (HF) popular Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
16
16
17
17
## Setup
18
18
@@ -36,35 +36,35 @@ pip install -e .
36
36
37
37
### Quick Test
38
38
39
-
To run evaluation for Hugging Face `Llama2 7B` model on a single GPU please run the following,
39
+
To run evaluation for Hugging Face `Llama3 8B` model on a single GPU please run the following,
There has been an study from [IBM on efficient benchmarking of LLMs](https://arxiv.org/pdf/2308.11696.pdf), with main take a way that to identify if a model is performing poorly, benchmarking on wider range of tasks is more important than the number example in each task. This means you could run the evaluation harness with fewer number of example to have initial decision if the performance got worse from the base line. To limit the number of example here, it can be set using `--limit` flag with actual desired number. But for the full assessment you would need to run the full evaluation. Please read more in the paper linked above.
Here, we provided a list of tasks from `Open-LLM-Leaderboard` which can be used by passing `--open-llm-leaderboard-tasks` instead of `tasks` to the `eval.py`.
67
+
Here, we provided a list of tasks from `Open-LLM-Leaderboard` which can be used by passing `--open-llm-leaderboard-tasks` instead of `tasks` to the `eval.py`.
68
68
69
69
**NOTE** Make sure to run the bash script below, that will set the `include paths` in the [config files](./open_llm_leaderboard/). The script will prompt you to enter the path to the cloned lm-evaluation-harness repo.**You would need this step only for the first time**.
In the HF leaderboard, the [LLMs are evaluated on 7 benchmarks](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) from Language Model Evaluation Harness as described below:
@@ -107,7 +107,7 @@ To perform *data-parallel evaluation* (where each GPU loads a **separate full co
@@ -138,7 +138,7 @@ These two options (`accelerate launch` and `parallelize=True`) are mutually excl
138
138
Also `lm-evaluation-harness` supports vLLM for faster inference on [supported model types](https://docs.vllm.ai/en/latest/models/supported_models.html), especially faster when splitting a model across multiple GPUs. For single-GPU or multi-GPU — tensor parallel, data parallel, or a combination of both — inference, for example:
For a full list of supported vLLM configurations, please to [here](https://github.com/EleutherAI/lm-evaluation-harness/blob/076372ee9ee81e25c4e2061256400570354a8d1a/lm_eval/models/vllm_causallms.py#L44-L62).
0 commit comments