You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: colossalai/inference/config.py
+7-3Lines changed: 7 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -99,7 +99,9 @@ class InferenceConfig:
99
99
early_stopping (Optional[bool]): Whether to stop the generation when all beam hypotheses have finished or not, defaults to False.
100
100
top_k (Optional[int]): The number of highest probability vocabulary tokens to keep for top-k-filtering, defaults to None.
101
101
top_p (Optional[float]): The cumulative probability threshold for retaining tokens with a total probability above it, defaults to None.
102
-
min_p (Optional[float]): The minimum probability to keep for top-p filtering, defaults to None.
102
+
temperature (Optional[float]): Randomness used to control randomization, defaults to 1.0.
103
+
repetition_penalty (Optional[float]): The parameter that influences the model's treatment of new tokens in relation to their appearance in the prompt and the generated text. Values greater than 1 incentivize the model to introduce new tokens, whereas values less than 1 incentivize token repetition., defaults to 1.0.
104
+
no_repeat_ngram_size (Optional[int]): If no_repeat_ngram_size > 0, the consecutive tokens of ngram size can only appear once in inference sentences.
103
105
n_spec_tokens (int): The maximum number of speculating tokens, defaults to None.
104
106
glimpse_large_kv (bool): Whether to use large KV in drafter model, defaults to False.
105
107
block_size (int): The number of blocks in a logical block, defaults to 16.
Copy file name to clipboardExpand all lines: colossalai/inference/logit_processors.py
+66-6Lines changed: 66 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,10 @@
1
+
# This code is adapted from huggingface transformers: https://github.com/huggingface/transformers/blob/v4.36.2/src/transformers/generation/logits_process.py
raiseValueError(f"'penalty={penalty}' has to be a strictly positive float and greater than 0.")
65
+
66
+
logit_list= []
67
+
68
+
# TODO(yuehuayingxueluo) This is only a temporary implementation. Later, we will implement presence_penalties, frequency_penalties, and repetition_penalties using CUDA kernels.
0 commit comments