You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"""Generate with budget forcing using the completions APIs. This relies on raw autocompletion and assumes the model's output is structued in the following form: '<think> ... </think> summary answer'
486
+
The budget forcing method is proposed in the paper: https://arxiv.org/abs/2501.19393
487
+
This implementation tries to follow the key outlines in the paper while ensuring stable and fail-safe operation.
488
+
This is performed via multi-step generation. The model will be called multiple times until requirements are met, in other words, the response will be assembeled conditionally.
489
+
490
+
Args:
491
+
think_max_tokens: Budget in number of tokens allocated for the think block
492
+
answer_max_tokens: Budget in number of tokens allocated for the summary and answer block, None indicates generating till EoS
493
+
start_think_token: String indicating start of think block, default <think>
494
+
end_think_token: String indicating end of think block, default </think>
495
+
begin_response_token: Used by certain models, string indicating start of response block, e.g. "<response>", default None
496
+
end_response_token: Used by certain models, string indicating end of response block, e.g. "</response>", default None
497
+
think_wait_suffix: String to append to force continued thinking, e.g. "\nWait" if set to None we will not force additional thinking. Use None for upper-bound budget case
498
+
answer_suffix: String to append to force a final answer
499
+
answer_token: Token that indicates an answer is generated
500
+
501
+
Assumptions:
502
+
- The chat template is applied on prompt, with think mode enabled
- python=3.12 # note: at the time of writing, xformer (< vllm) has a broken wheel for 3.13. https://github.com/facebookresearch/xformers/issues/740#issuecomment-2753869337
0 commit comments