Skip to content

Commit 5739231

Browse files
committed
rebased to main, and changed readme
1 parent 81e8a13 commit 5739231

File tree

3 files changed

+30
-28
lines changed

3 files changed

+30
-28
lines changed

recipes/quickstart/finetuning/datasets/raft_dataset.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@
66
from datasets import load_dataset
77
import itertools
88

9-
B_INST, E_INST = "[INST]", "[/INST]"
109
# check system prompt token seq or user prompt token seq is in the current token list
1110
def check_header(targets,seq):
1211
for i in range(len(seq)-3):

recipes/use_cases/end2end-recipes/raft/README.md

Lines changed: 29 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,16 @@
11

2-
## Introduction:
3-
As the popularity of our Meta Llama 3 models grows, we've seen a surge in demand to adapt them to specific domains, enabling businesses to better serve their customers. For instance, a company might have a vast collection of plain text documents related to their custom domain and want to create a chatbot that can answer client questions.
2+
## Chatbot Recipe:
3+
As the popularity of our Meta Llama 3 models grows, we've seen a surge in demand to adapt them to specific domains, enabling businesses to better serve their customers. For example, a company might have a vast collection of plain text documents related to their custom domain and want to create a chatbot that can answer client questions.
44

5-
In response to this demand, we're exploring the possibility of building a Llama chatbot that can answer Llama related questions using our Meta Llama 3 models. In this tutorial, we'll demonstrate how to do just that. While our Meta Llama 3 70B Instruct model is an excellent candidate, as it already has a excellent reasoning capabilities and knowledge, its production costs are relatively high. To reduce these costs, we'll focus on creating a Llama chatbot based on the Meta Llama 8B Instruct model, aiming to achieve similar accuracy to the Meta Llama 3 70B Instruct model while minimizing inference costs.
5+
In response to this demand, we're exploring the possibility of building a Llama chatbot that can answer Llama-related questions using our Meta Llama 3 models. In this tutorial, we'll demonstrate how to do just that. While our Meta Llama 3 70B Instruct model is an excellent candidate, its production costs are relatively high. To reduce these costs, we'll focus on creating a Llama chatbot based on the Meta Llama 8B Instruct model, aiming to achieve similar accuracy while minimizing inference costs.
66

7-
## Collecting Text Data for the Llama Bot
7+
One common ML approach to produce a model based on new domain data is **fine-tuning**. The idea is to start from a pre-trained model that already has some knowledge of language from its pre-training and adapt it to a new domain. However, [recent paper](https://arxiv.org/pdf/2405.05904) highlights the risk of using supervised fine-tuning to update LLMs' knowledge, as it presents empirical evidence that acquiring new knowledge through fine-tuning is correlated with hallucinations w.r.t. preexisting knowledge. Fine-tuning can also be costly if the domain knowledge has to be updated frequently.
88

9-
To build a Llama bot, we need to collect relevant text data. Ideally, we would include a vast range of Llama-related web documents, but for demo purposes, we'll focus on official documents. For example, we can use the raw text from official web pages listed in [Getting started with Meta Llama](https://llama.meta.com/get-started/), excluding the FAQ page since some evaluation questions will come from there.
10-
11-
We have two options to obtain the text data: using a local folder or web crawling. For the local folder option, we can download the desired documents in PDF, Text, or Markdown format to the "data" folder specified in the [raft.yaml](./raft.yaml) file. Langchain DirectoryLoader will load files in that folder, but it may also ask us to install more package dependency if the files formats are not supported natively.
12-
13-
Alternatively, we can create a sitemap XML file, similar to the example below, and put the file path in the [raft.yaml](./raft.yaml) file, so eventually a Langchain SitemapLoader can retrieve all the text from the web pages.
14-
15-
```xml
16-
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
17-
<url>
18-
<loc>http://llama.meta.com/responsible-use-guide/</loc>
19-
</url>
20-
<!-- more URLs -->
21-
</urlset>
22-
```
9+
Another solution is to use **RAG (Retrieval-Augmented Generation)**, which combines the strengths of traditional information retrieval systems (such as databases) with the capabilities of generative large language models (LLMs). RAG operates by first retrieving relevant information from a database using a query generated by the LLM. This retrieved information is then integrated into the LLM's query input, enabling it to generate more accurate and contextually relevant text. This helps to reduce LLM hallucination as the related documents are provided to LLM and has a lower cost to update the domain knowledge.
2310

24-
## Retrieval Augmented Fine Tuning (RAFT) Concepts
11+
In this tutorial, we'll use **Retrieval Augmented Fine Tuning (RAFT)**, a technique that combines fine-tuning with RAG to better utilize custom domain text data. RAFT is a general recipe for fine-tuning a pre-trained Large Language Model (LLM) to a domain-specific RAG setting. It helps LLM to better utilize custom domain text data, by ignoring those documents that don’t help in answering the question. This approach can create a more factual model and reduce LLM hallucinations during inference.
2512

26-
In this tutorial, we'll introduce Retrieval Augmented Fine Tuning (RAFT), a technique that combines fine-tuning with RAG to better utilize custom domain text data.
27-
28-
RAFT is a general recipe for fine-tuning a pre-trained Large Language Model (LLM) to a domain-specific RAG setting. The process involves preparing training data with each data point containing:
13+
The process involves preparing training data with each data point containing:
2914

3015
* A question (Q)
3116
* A set of documents (D)
@@ -41,6 +26,23 @@ The following graph illustrates the RAFT main concepts:
4126

4227
For more information on RAFT, please refer to their [blog post](https://gorilla.cs.berkeley.edu/blogs/9_raft.html).
4328

29+
## Fine-tuning Llama
30+
31+
To build a Llama bot, we need to collect relevant text data. Ideally, we would include a vast range of Llama-related web documents, but for demo purposes, we'll focus on official documents. For example, we can use the raw text from official web pages listed in [Getting started with Meta Llama](https://llama.meta.com/get-started/), excluding the FAQ page since some evaluation questions will come from there.
32+
33+
We have two options to obtain the text data: using a local folder or web crawling. For the local folder option, we can download the desired documents in PDF, Text, or Markdown format to the "data" folder specified in the [raft.yaml](./raft.yaml) file. Langchain DirectoryLoader will load files in that folder, but it may also ask us to install more package dependency if the files formats are not supported natively.
34+
35+
Alternatively, we can create a sitemap XML file, similar to the example below, and put the file path in the [raft.yaml](./raft.yaml) file, so eventually a Langchain SitemapLoader can retrieve all the text from the web pages.
36+
37+
```xml
38+
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
39+
<url>
40+
<loc>http://llama.meta.com/responsible-use-guide/</loc>
41+
</url>
42+
<!-- more URLs -->
43+
</urlset>
44+
```
45+
4446
## Create RAFT Dataset
4547

4648
To create a RAFT dataset from the prepared documents, we can use the Meta Llama 3 70B Instruct model either through APIs from LLM cloud providers or by hosting a local VLLM server.
@@ -119,18 +121,18 @@ Once the RAFT dataset is ready in JSON format, we can start fine-tuning. Unfortu
119121
```bash
120122
export PATH_TO_ROOT_FOLDER=./raft-8b
121123
export PATH_TO_RAFT_JSON=recipes/use_cases/end2end-recipes/raft/output/raft.jsonl
122-
torchrun --nnodes 1 --nproc_per_node 4 recipes/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --context_length 8192 --num_epochs 1 --batch_size_training 1 --model_name meta-Llama/Meta-Llama-3-8B-Instruct --dist_checkpoint_root_folder $PATH_TO_ROOT_FOLDER --dist_checkpoint_folder fine-tuned --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/finetuning/datasets/raft_dataset.py" --use-wandb --run_validation True --custom_dataset.data_path $PATH_TO_RAFT_JSON
124+
torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --context_length 8192 --num_epochs 1 --batch_size_training 1 --model_name meta-Llama/Meta-Llama-3-8B-Instruct --dist_checkpoint_root_folder $PATH_TO_ROOT_FOLDER --dist_checkpoint_folder fine-tuned --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/finetuning/datasets/raft_dataset.py" --use-wandb --run_validation True --custom_dataset.data_path $PATH_TO_RAFT_JSON
123125
```
124126

125-
For more details on multi-GPU fine-tuning, please refer to the [multigpu_finetuning.md](../../../finetuning/multigpu_finetuning.md) in the finetuning recipe.
127+
For more details on multi-GPU fine-tuning, please refer to the [multigpu_finetuning.md](../../../quickstart/finetuning/multigpu_finetuning.md) in the finetuning recipe.
126128

127129
Next, we need to convert the FSDP checkpoint to a HuggingFace checkpoint using the following command:
128130

129131
```bash
130-
python src/Llama_recipes/inference/checkpoint_converter_fsdp_hf.py --fsdp_checkpoint_path "$PATH_TO_ROOT_FOLDER/fine-tuned-meta-Llama/Meta-Llama-3-8B-Instruct" --consolidated_model_path "$PATH_TO_ROOT_FOLDER"
132+
python src/llama_recipes/inference/checkpoint_converter_fsdp_hf.py --fsdp_checkpoint_path "$PATH_TO_ROOT_FOLDER/fine-tuned-meta-Llama/Meta-Llama-3-8B-Instruct" --consolidated_model_path "$PATH_TO_ROOT_FOLDER"
131133
```
132134

133-
For more details on FSDP to HuggingFace checkpoint conversion, please refer to the [readme](../../../inference/local_inference/README.md) in the inference/local_inference recipe.
135+
For more details on FSDP to HuggingFace checkpoint conversion, please refer to the [readme](../../../quickstart/inference/local_inference/README.md) in the inference/local_inference recipe.
134136

135137
## Evaluation Steps
136138
Once we have the RAFT model, we need to evaluate its performance. In this tutorial, we'll not only use traditional evaluation methods (e.g., calculating exact match rate or ROUGE score) but also use LLM as a judge to score model-generated answers.
@@ -234,7 +236,7 @@ Once we evaluated and refined our RAFT model, we can deploy it locally to intera
234236
python recipes/inference/local_inference/inference.py --model_name raft-8b
235237
```
236238

237-
For more details,please check [local_inference recipe](../../../inference/local_inference/README.md)
239+
For more details,please check [local_inference recipe](../../../quickstart/inference/local_inference/README.md)
238240

239241
## Acknowledgements
240242

recipes/use_cases/end2end-recipes/raft/format.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
# file copied from https://github.com/ShishirPatil/gorilla/blob/main/raft/format.py
12
from abc import ABC, abstractmethod
23
import argparse
34
from datasets import Dataset, load_dataset

0 commit comments

Comments
 (0)