Add option to exclude "reasoning" tokens from Max Output for thinking models #1356

VL4DST3R · 2025-02-08T16:52:17Z

VL4DST3R
Feb 8, 2025

The thinking process in DeepSeek-derived models can often be longer than the final result, but the user has no way to anticipate this, which often leads to just an incomplete thinking process with no final answer.

It would be very nice to include an option where the model can "think" as much as needed (or within a separately defined limit) without counting those tokens toward the final Max Output. This would allow the reasoning phase to function more independently since it’s not necessarily intended for the user to see or directly use it.

From my attempt to somewhat circumvent this I would also like to know: is there currently a way to define (via arguments) the initial Max Output size? Or the effective range available in the ui slider? I know you can manually type values above 512, but a way to set it like you can --contextsize would be welcomed as well.

VL4DST3R · 2025-03-14T09:52:58Z

VL4DST3R
Mar 14, 2025
Author

Hey, don't want to clutter the Issues page with suggestions like this but I would very much appreciate some feedback if this is doable/within scope @LostRuins 🙇🏻‍♂️

2 replies

LostRuins Mar 14, 2025
Maintainer

If you load an instance with a contextsize higher than 8k, the slider will go up to 1k instead. In future I might add a dedicated option to set max tokens.

Regarding unlimited thinking, that's a little tricky since the model doesn't know whether it's in "thinking" mode or not when generating - it's just generating token continuations! It's the UI that does the segmentation.

VL4DST3R Mar 14, 2025
Author

Yeah I understand that, but I figured having it continue generating say, 3 times, similar to the current idle responses feature, or until a gets hit would be enough for most needs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add option to exclude "reasoning" tokens from Max Output for thinking models #1356

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Add option to exclude "reasoning" tokens from Max Output for thinking models #1356

Uh oh!

VL4DST3R Feb 8, 2025

Replies: 1 comment · 2 replies

Uh oh!

VL4DST3R Mar 14, 2025 Author

Uh oh!

LostRuins Mar 14, 2025 Maintainer

Uh oh!

VL4DST3R Mar 14, 2025 Author

VL4DST3R
Feb 8, 2025

Replies: 1 comment 2 replies

VL4DST3R
Mar 14, 2025
Author

LostRuins Mar 14, 2025
Maintainer

VL4DST3R Mar 14, 2025
Author