Skip to content

Commit 7554cbc

Browse files
committed
Fix dataset example
1 parent 5495e4c commit 7554cbc

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

example_dataset.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True)
1010
tokenizer.pad_token = tokenizer.eos_token
1111

12-
ds = load_dataset("mgoin/ultrachat_2k", split="train_sft").select(512)
12+
ds = load_dataset("mgoin/ultrachat_2k", split="train_sft").select(range(512))
1313
examples = [tokenizer.apply_chat_template(batch["messages"], tokenize=False) for batch in ds]
1414
examples = tokenizer(examples, padding=True, truncation=True, return_tensors="pt").to("cuda")
1515

0 commit comments

Comments
 (0)