-
|
I've been using GPT-OSS 20B and 120B a lot recently, and am wondering if Optillm will help with models already trained for reasoning (R1, Qwen 3-thinking, GPT-OSS, O3, etc). These models are already trained with their own internal chain of thought and verification, so I think these inference-time search strategies won't help too much when it comes to filtering and pruning search paths. But maybe it would help a little because it allows the request to think for even longer and use more tokens? What are your thoughts? And are there any research or benchmarks to validate? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
Yes, inference-time techniques are still useful for reasoning/thinking models. We have a number of them in the repo that demonstrate that - for instance see autothink and deepthink. Even in reasoning models both sequential inference-time scaling by generating more tokens and parallel inference-time scaling by combining multiple parallel responses helps improve the accuracy. This is similar to how Grok-Heavy or Gemini-DeepThink work. In addition, there is work to make the reasoning more efficient as we did in AutoThink - https://huggingface.co/blog/codelion/autothink |
Beta Was this translation helpful? Give feedback.
Yes, inference-time techniques are still useful for reasoning/thinking models. We have a number of them in the repo that demonstrate that - for instance see autothink and deepthink. Even in reasoning models both sequential inference-time scaling by generating more tokens and parallel inference-time scaling by combining multiple parallel responses helps improve the accuracy. This is similar to how Grok-Heavy or Gemini-DeepThink work. In addition, there is work to make the reasoning more efficient as we did in AutoThink - https://huggingface.co/blog/codelion/autothink