You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`--k-shift N`: Shift the first token selection by cutting out N tokens from the top once (default: 0).
217
+
218
+
K-Shift is a sampling method that guides models away from the most obvious output, eliciting reasoning and analysis. It cuts out k top tokens once at the beginning of inference, making sure that the dialog will start from a less obvious path without guiding the model too much. The method was mentoned in a paper [Chain-of-Thought Reasoning without Prompting](https://arxiv.org/pdf/2402.10200) as a simple trick to guiding a model towards reasoning. In practice, K-Shift can improve the quality of reasoning, help bypass bias or censorship in certain cases, and may also be used as a diagnostics tool. K-Shift is intended to be used with greedy sampling (`--k-shift 10 --top-k 1`), but can help with creative writing too - albeit, not as much as XTC. The default value is 0.
219
+
220
+
Example usage: `--k-shift 10`
221
+
214
222
### Top-K Sampling
215
223
216
224
-`--top-k N`: Limit the next token selection to the K most probable tokens (default: 40).
${IntField({label: "K-Shift",title: "Cuts out first k tokens once at the start of sampling. Intended to use with greedy sampling.",max: 100,min: 0,step: 1,name: "k_shift",value: params.value.k_shift})}
838
840
${IntField({label: "Top-K",title: "Limits the selection of the next token to the K most probable tokens. 1 means no randomness = greedy sampling. If set to 0, it means the entire vocabulary size is considered.",max: 100,min: 0,step: 1,name: "top_k",value: params.value.top_k})}
839
841
${IntField({label: "Penalize Last N",title: "The last n tokens that are taken into account to penalise repetitions. A value of 0 means that this function is deactivated and -1 means that the entire size of the context is taken into account.",max: 2048,min: 0,step: 16,name: "repeat_last_n",value: params.value.repeat_last_n})}
840
842
${FloatField({label: "Presence Penalty",title: "A penalty that is applied if certain tokens appear repeatedly in the generated text. A higher value leads to fewer repetitions.",max: 1.0,min: 0.0,name: "presence_penalty",step: 0.01,value: params.value.presence_penalty})}
/// @details Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
1106
1109
/// @param candidates A vector of `llama_token_data` containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
1107
1110
/// @param tau The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
0 commit comments