Skip to content

Conversation

aldehir
Copy link
Owner

@aldehir aldehir commented Aug 10, 2025

gpt-oss requires reasoning content from previous interactions to perform well.

This implements a reasoning cache, controlled by --reasoning-cache to store the reasoning content from tool call messages. On subsequent requests, the reasoning content is injected into the message. Since regular messages don't have an associated id, it will not add reasoning content to them. This aligns with the recommendations by OpenAI

To use, pass in --reasoning-cache 128.

Unfortunately, it won't work well when placed behind llama-swap with a ttl, as it will kill the process and the cache with it.

@aldehir aldehir force-pushed the gpt-oss-inject-reasoning branch 3 times, most recently from d95c39f to b1c1dcc Compare August 11, 2025 04:42
@aldehir aldehir force-pushed the feature/harmony-parser branch from 6343a7f to 04e1626 Compare August 12, 2025 02:02
@aldehir aldehir force-pushed the gpt-oss-inject-reasoning branch from b1c1dcc to d46c87e Compare August 12, 2025 02:05
@aldehir aldehir force-pushed the feature/harmony-parser branch from b0b16e2 to 1e595d2 Compare August 14, 2025 08:51
@victorb
Copy link

victorb commented Aug 16, 2025

When I tried this last week, I think it helped a lot with preventing tool hallucinations. After all the work that went into ggml-org#15181 in the end, might something like this reasoning-cachestill be needed for better tool usage by GPT-OSS? There currently seems to be a bunch of conflicts I don't know how to resolve, so unfortunately cannot check myself if it improves it or not.

@aldehir aldehir force-pushed the gpt-oss-inject-reasoning branch from d46c87e to 9d1d245 Compare August 16, 2025 19:33
@aldehir aldehir changed the base branch from feature/harmony-parser to master August 16, 2025 19:40
@aldehir
Copy link
Owner Author

aldehir commented Aug 16, 2025

@victorb I have rebased the changes here onto the latest upstream master branch. Feel free to use this in the mean time!

@aldehir
Copy link
Owner Author

aldehir commented Aug 17, 2025

@victorb https://github.com/aldehir/gpt-oss-adapter this might of interest to you. I think it'll take a while before clients start responding back with reasoning content. I don't like the idea of running yet another proxy, but it might be easier than continuously rebasing onto llama.cpp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants