rebased to main, and changed readme

wukaixingxp · wukaixingxp · commit 5739231b14de · 2024-07-08T15:45:24.000-07:00
diff --git a/recipes/quickstart/finetuning/datasets/raft_dataset.py b/recipes/quickstart/finetuning/datasets/raft_dataset.py
@@ -6,7 +6,6 @@
 from datasets import load_dataset
 import itertools
 
-B_INST, E_INST = "[INST]", "[/INST]"
 # check system prompt token seq or user prompt token seq is in the current token list
 def check_header(targets,seq):
     for i in range(len(seq)-3):
diff --git a/recipes/use_cases/end2end-recipes/raft/README.md b/recipes/use_cases/end2end-recipes/raft/README.md
@@ -1,31 +1,16 @@
 
-## Introduction:
-As the popularity of our Meta Llama 3 models grows, we've seen a surge in demand to adapt them to specific domains, enabling businesses to better serve their customers. For instance, a company might have a vast collection of plain text documents related to their custom domain and want to create a chatbot that can answer client questions.
+## Chatbot Recipe:
+As the popularity of our Meta Llama 3 models grows, we've seen a surge in demand to adapt them to specific domains, enabling businesses to better serve their customers. For example, a company might have a vast collection of plain text documents related to their custom domain and want to create a chatbot that can answer client questions.
 
-In response to this demand, we're exploring the possibility of building a Llama chatbot that can answer Llama related questions using our Meta Llama 3 models. In this tutorial, we'll demonstrate how to do just that. While our Meta Llama 3 70B Instruct model is an excellent candidate, as it already has a excellent reasoning capabilities and knowledge, its production costs are relatively high. To reduce these costs, we'll focus on creating a Llama chatbot based on the Meta Llama 8B Instruct model, aiming to achieve similar accuracy to the Meta Llama 3 70B Instruct model while minimizing inference costs.
+In response to this demand, we're exploring the possibility of building a Llama chatbot that can answer Llama-related questions using our Meta Llama 3 models. In this tutorial, we'll demonstrate how to do just that. While our Meta Llama 3 70B Instruct model is an excellent candidate, its production costs are relatively high. To reduce these costs, we'll focus on creating a Llama chatbot based on the Meta Llama 8B Instruct model, aiming to achieve similar accuracy while minimizing inference costs.
 
-## Collecting Text Data for the Llama Bot
+One common ML approach to produce a model based on new domain data is **fine-tuning**. The idea is to start from a pre-trained model that already has some knowledge of language from its pre-training and adapt it to a new domain. However, [recent paper](https://arxiv.org/pdf/2405.05904) highlights the risk of using supervised fine-tuning to update LLMs' knowledge, as it presents empirical evidence that acquiring new knowledge through fine-tuning is correlated with hallucinations w.r.t. preexisting knowledge. Fine-tuning can also be costly if the domain knowledge has to be updated frequently.
 
-To build a Llama bot, we need to collect relevant text data. Ideally, we would include a vast range of Llama-related web documents, but for demo purposes, we'll focus on official documents. For example, we can use the raw text from official web pages listed in [Getting started with Meta Llama](https://llama.meta.com/get-started/), excluding the FAQ page since some evaluation questions will come from there.
-
-We have two options to obtain the text data: using a local folder or web crawling. For the local folder option, we can download the desired documents in PDF, Text, or Markdown format to the "data" folder specified in the [raft.yaml](./raft.yaml) file. Langchain DirectoryLoader will load files in that folder, but it may also ask us to install more package dependency if the files formats are not supported natively.
-
-Alternatively, we can create a sitemap XML file, similar to the example below, and put the file path in the [raft.yaml](./raft.yaml) file, so eventually a Langchain SitemapLoader can retrieve all the text from the web pages.
-
-```xml
-<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
-  <url>
-    <loc>http://llama.meta.com/responsible-use-guide/</loc>
-  </url>
-  <!-- more URLs -->
-</urlset>
-```
+Another solution is to use **RAG (Retrieval-Augmented Generation)**, which combines the strengths of traditional information retrieval systems (such as databases) with the capabilities of generative large language models (LLMs). RAG operates by first retrieving relevant information from a database using a query generated by the LLM. This retrieved information is then integrated into the LLM's query input, enabling it to generate more accurate and contextually relevant text. This helps to reduce LLM hallucination as the related documents are provided to LLM and has a lower cost to update the domain knowledge.
 
-## Retrieval Augmented Fine Tuning (RAFT) Concepts
+In this tutorial, we'll use **Retrieval Augmented Fine Tuning (RAFT)**, a technique that combines fine-tuning with RAG to better utilize custom domain text data. RAFT is a general recipe for fine-tuning a pre-trained Large Language Model (LLM) to a domain-specific RAG setting. It helps LLM to better utilize custom domain text data, by ignoring those documents that don’t help in answering the question. This approach can create a more factual model and reduce LLM hallucinations during inference.
 
-In this tutorial, we'll introduce Retrieval Augmented Fine Tuning (RAFT), a technique that combines fine-tuning with RAG to better utilize custom domain text data.
-
-RAFT is a general recipe for fine-tuning a pre-trained Large Language Model (LLM) to a domain-specific RAG setting. The process involves preparing training data with each data point containing:
+The process involves preparing training data with each data point containing:
 
 * A question (Q)
 * A set of documents (D)
@@ -41,6 +26,23 @@ The following graph illustrates the RAFT main concepts:
 
 For more information on RAFT, please refer to their [blog post](https://gorilla.cs.berkeley.edu/blogs/9_raft.html).
 
+## Fine-tuning Llama
+
+To build a Llama bot, we need to collect relevant text data. Ideally, we would include a vast range of Llama-related web documents, but for demo purposes, we'll focus on official documents. For example, we can use the raw text from official web pages listed in [Getting started with Meta Llama](https://llama.meta.com/get-started/), excluding the FAQ page since some evaluation questions will come from there.
+
+We have two options to obtain the text data: using a local folder or web crawling. For the local folder option, we can download the desired documents in PDF, Text, or Markdown format to the "data" folder specified in the [raft.yaml](./raft.yaml) file. Langchain DirectoryLoader will load files in that folder, but it may also ask us to install more package dependency if the files formats are not supported natively.
+
+Alternatively, we can create a sitemap XML file, similar to the example below, and put the file path in the [raft.yaml](./raft.yaml) file, so eventually a Langchain SitemapLoader can retrieve all the text from the web pages.
+
+```xml
+<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
+  <url>
+    <loc>http://llama.meta.com/responsible-use-guide/</loc>
+  </url>
+  <!-- more URLs -->
+</urlset>
+```
+
 ## Create RAFT Dataset
 
 To create a RAFT dataset from the prepared documents, we can use the Meta Llama 3 70B Instruct model either through APIs from LLM cloud providers or by hosting a local VLLM server.
@@ -119,18 +121,18 @@ Once the RAFT dataset is ready in JSON format, we can start fine-tuning. Unfortu
 ```bash
 export PATH_TO_ROOT_FOLDER=./raft-8b
 export PATH_TO_RAFT_JSON=recipes/use_cases/end2end-recipes/raft/output/raft.jsonl
-torchrun --nnodes 1 --nproc_per_node 4  recipes/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --context_length 8192 --num_epochs 1 --batch_size_training 1 --model_name meta-Llama/Meta-Llama-3-8B-Instruct --dist_checkpoint_root_folder $PATH_TO_ROOT_FOLDER --dist_checkpoint_folder fine-tuned  --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/finetuning/datasets/raft_dataset.py" --use-wandb  --run_validation True  --custom_dataset.data_path $PATH_TO_RAFT_JSON
+torchrun --nnodes 1 --nproc_per_node 4  recipes/quickstart/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --context_length 8192 --num_epochs 1 --batch_size_training 1 --model_name meta-Llama/Meta-Llama-3-8B-Instruct --dist_checkpoint_root_folder $PATH_TO_ROOT_FOLDER --dist_checkpoint_folder fine-tuned  --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/finetuning/datasets/raft_dataset.py" --use-wandb  --run_validation True  --custom_dataset.data_path $PATH_TO_RAFT_JSON
 ```
 
-For more details on multi-GPU fine-tuning, please refer to the [multigpu_finetuning.md](../../../finetuning/multigpu_finetuning.md) in the finetuning recipe.
+For more details on multi-GPU fine-tuning, please refer to the [multigpu_finetuning.md](../../../quickstart/finetuning/multigpu_finetuning.md) in the finetuning recipe.
 
 Next, we need to convert the FSDP checkpoint to a HuggingFace checkpoint using the following command:
 
 ```bash
-python src/Llama_recipes/inference/checkpoint_converter_fsdp_hf.py --fsdp_checkpoint_path  "$PATH_TO_ROOT_FOLDER/fine-tuned-meta-Llama/Meta-Llama-3-8B-Instruct" --consolidated_model_path "$PATH_TO_ROOT_FOLDER"
+python src/llama_recipes/inference/checkpoint_converter_fsdp_hf.py --fsdp_checkpoint_path  "$PATH_TO_ROOT_FOLDER/fine-tuned-meta-Llama/Meta-Llama-3-8B-Instruct" --consolidated_model_path "$PATH_TO_ROOT_FOLDER"
 ```
 
-For more details on FSDP to HuggingFace checkpoint conversion, please refer to the [readme](../../../inference/local_inference/README.md) in the inference/local_inference recipe.
+For more details on FSDP to HuggingFace checkpoint conversion, please refer to the [readme](../../../quickstart/inference/local_inference/README.md) in the inference/local_inference recipe.
 
 ## Evaluation Steps
 Once we have the RAFT model, we need to evaluate its performance. In this tutorial, we'll not only use traditional evaluation methods (e.g., calculating exact match rate or ROUGE score) but also use LLM as a judge to score model-generated answers.
@@ -234,7 +236,7 @@ Once we evaluated and refined our RAFT model, we can deploy it locally to intera
 python recipes/inference/local_inference/inference.py --model_name raft-8b
 ```
 
-For more details,please check [local_inference recipe](../../../inference/local_inference/README.md)
+For more details,please check [local_inference recipe](../../../quickstart/inference/local_inference/README.md)
 
 ## Acknowledgements
 
diff --git a/recipes/use_cases/end2end-recipes/raft/format.py b/recipes/use_cases/end2end-recipes/raft/format.py
@@ -1,3 +1,4 @@
+# file copied from https://github.com/ShishirPatil/gorilla/blob/main/raft/format.py
 from abc import ABC, abstractmethod
 import argparse
 from datasets import Dataset, load_dataset

Original file line number	Diff line number	Diff line change
`@@ -1,3 +1,4 @@`
	`1`	`+# file copied from https://github.com/ShishirPatil/gorilla/blob/main/raft/format.py`
`1`	`2`	`from abc import ABC, abstractmethod`
`2`	`3`	`import argparse`
`3`	`4`	`from datasets import Dataset, load_dataset`