v0.7.2

Latest

Latest

RobbieLeung released this 18 Dec 06:17

· 163 commits to main since this release

2d82650

Major Features and Improvements

Feature

Enhance Qwen3-MoE to support TP settings beyond 4.
Implement chunked prefill and prefix cache for Qwen3 MoE.
Support prefix cache for DeepSeek-V3/R1 models.

Bugfix

Fix core dump issue triggered by client disconnection.
Fix the incorrect reading of model args from Qwen3-VL's config.json.
Setup the tokenizer config function of bos and eos to fast tokenizer.
Fix the memory leak issue in the completions interface.

Assets 2