You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -64,24 +64,24 @@ Here is an example format of the concatenated string:
64
64
"""
65
65
<s>system
66
66
System instruction
67
-
<s>user
67
+
<s>human
68
68
User 1st round input
69
-
<s>assistant
69
+
<s>bot
70
70
Assistant 1st round output{EOS_TOKEN}
71
-
<s>user
71
+
<s>human
72
72
User 2nd round input
73
-
<s>assistant
73
+
<s>bot
74
74
Assistant 2nd round output{EOS_TOKEN}
75
75
...
76
76
...
77
77
...
78
-
<s>user
78
+
<s>human
79
79
User nth round input
80
-
<s>assistant
80
+
<s>bot
81
81
{Assistant output to be genreated}{EOS_TOKEN}
82
82
"""
83
83
```
84
-
When applying inference, you always make your input string end with ```<s>assistant\n``` to request the model generating answers.
84
+
When applying inference, you always make your input string end with ```<s>bot\n``` to request the model generating answers.
85
85
86
86
87
87
@@ -107,14 +107,14 @@ cd mftcoder_accelerate/src
107
107
```
108
108
109
109
### 3.1 Tokenization
110
-
During training, we concatenate multi-turn dialogues into the following format (also known as the inference data format mentioned earlier) and then tokenize it. In this format, ```<s>user\n``` starts the user's input (i.e., prompt),```<s>assistant\n``` starts the assistant's output (i.e., response)
110
+
During training, we concatenate multi-turn dialogues into the following format (also known as the inference data format mentioned earlier) and then tokenize it. In this format, ```<s>human\n``` starts the user's input (i.e., prompt),```<s>bot\n``` starts the assistant's output (i.e., response)
111
111
112
112
```{EOS_TOKEN}``` represents the proper eos_token.
113
113
We have different eos_tokens in ```src/pefts/model_mapping.py``` which fits different base models.
114
114
115
115
Here is a visionable example of the training data after formatting:
During the calculation of loss, we use a ```loss mask``` to ensure that the loss from the input part does not contribute to parameter updates. Only the loss from the ```target{EOS_TOKEN}``` part is used for updating parameters.
120
120
This approach takes full advantage of the benefits of model parallelism, making training more efficient. It also leverages the characteristic of decoder-only models with left-to-right attention.
@@ -149,6 +149,8 @@ Frequently used arguments are provided in ```configs/***_train_config``` and exp
149
149
150
150
-**model_type**: Type of the model to train, e.g., "mixtral | llama | starcoder | chatglm2 | qwen | gpt_neox".
151
151
152
+
-**attn_implementation**: "flash_attention_2" or "eager" or "sdpa", worked when model is supported by transformers officially
0 commit comments