Skip to content

Commit 148c274

Browse files
committed
Add special tokens for models
We should download the relevant files from HF. I don't think we can avoid implementing the Jinja2 templates for each model family though. Would need to use regular expressions instead of full names (might be slow).
1 parent 45b186d commit 148c274

File tree

1 file changed

+29
-0
lines changed

1 file changed

+29
-0
lines changed

prompts/tokens.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
from dataclasses import dataclass
2+
from typing import Dict, Optional
3+
4+
5+
@dataclass
6+
class Limits:
7+
begin: str = ""
8+
end: str = ""
9+
10+
11+
@dataclass
12+
class Special:
13+
sequence: Limits = Limits("", "")
14+
user: Limits = Limits("", "")
15+
assistant: Limits = Limits("", "")
16+
system: Limits = Limits("", "")
17+
18+
19+
SPECIAL_TOKENS: Dict[Optional[str], Special] = {
20+
None: Special(),
21+
"google/gemma-2-9b": Special(Limits("<bos>", "<eos>")),
22+
"openai-community/gpt2": Special(Limits("", "<|endoftext|>")),
23+
"mistralai/Mistral-7B-v0.1": Special(Limits("<s>", "</s>")),
24+
"mistralai/Mistral-7B-Instruct-v0.1": Special(
25+
Limits("<s>", "</s>"),
26+
Limits("[INST]", "[/INST]"),
27+
Limits("", "</s>"),
28+
),
29+
}

0 commit comments

Comments
 (0)