v0.0.55

Latest

Latest

Evrard-Nil released this 10 Mar 13:52

· 28 commits to main since this release

dfcbb32

fix: disable EAGLE3 speculative decoding for gpt-oss-120b

Streaming responses were consistently dropping the last 1-2 tokens due to a vLLM v0.12.0 EAGLE3 bug. Non-streaming was unaffected.

Assets 2