Skip to content

Commit 913c920

Browse files
[Colossal-LLaMA] Fix sft issue for llama2 (#5719)
* fix minor issue * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 43995ee commit 913c920

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

applications/Colossal-LLaMA/prepare_sft_dataset.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
import os
1111
from multiprocessing import cpu_count
1212

13-
from colossal_llama.dataset.conversation import default_conversation
13+
from colossal_llama.dataset.conversation import LLaMA2_Conv
1414
from colossal_llama.dataset.spliced_and_tokenized_dataset import supervised_tokenize_sft
1515
from datasets import dataset_dict, load_dataset
1616
from transformers import AddedToken, AutoTokenizer
@@ -78,6 +78,7 @@ def main():
7878
# Fix </s> split issue: https://github.com/huggingface/transformers/issues/23833
7979
if args.llama_version == 2:
8080
tokenizer.add_tokens(AddedToken("</s>", normalized=False, special=True), special_tokens=True)
81+
default_conversation = LLaMA2_Conv
8182

8283
tokenizer.add_bos_token = False
8384
tokenizer.add_eos_token = False

0 commit comments

Comments
 (0)