Skip to content

v0.0.55

Latest

Choose a tag to compare

@Evrard-Nil Evrard-Nil released this 10 Mar 13:52
· 28 commits to main since this release

fix: disable EAGLE3 speculative decoding for gpt-oss-120b

Streaming responses were consistently dropping the last 1-2 tokens due to a vLLM v0.12.0 EAGLE3 bug. Non-streaming was unaffected.