Releases: xorbitsai/inference
v2.1.0
What's new in 2.1.0 (2026-02-14)
These are the changes in inference v2.1.0.
New features
- FEAT: [model] GLM-4.7 support by @Jun-Howie in #4565
- FEAT: [model] MinerU2.5-2509-1.2B removed by @OliverBryant in #4568
- FEAT: [model] GLM-4.7-Flash support by @OliverBryant in #4578
- FEAT: [model] Qwen3-ASR-0.6B support by @leslie2046 in #4579
- FEAT: [model] Qwen3-ASR-1.7B support by @leslie2046 in #4580
- FEAT: added support qwen3-asr models by @leslie2046 in #4581
- FEAT: [model] MinerU2.5-2509-1.2B support by @GaoLeiA in #4569
- FEAT: [model] FLUX.2-klein-4B support by @lazariv in #4602
- FEAT: [model] FLUX.2-klein-9B support by @lazariv in #4603
- FEAT: Add support for FLUX.2-Klein-9B and -4B models by @lazariv in #4596
Enhancements
- ENH: update model "DeepSeek-V3.2" JSON by @OliverBryant in #4563
- ENH: update model "DeepSeek-V3.2-Exp" JSON by @OliverBryant in #4567
- ENH: update models JSON [image] by @XprobeBot in #4606
- BLD: constrain setuptools<82 in Docker images by @qinxuye in #4607
- REF: extract Pydantic request schemas from restful_api.py into xinference/api/schemas/ by @amumu96 in #4598
- REF: extract route registration into domain-specific routers/ by @amumu96 in #4600
Bug fixes
- BUG: vllm embedding model error by @OliverBryant in #4562
- BUG: vllm reranker score error by @OliverBryant in #4573
- BUG: handle async tokenizer in vllm core by @ace-xc in #4577
- BUG: vllm reranker model gpu release error by @OliverBryant in #4575
Documentation
Others
- BUG:setuptools CI error by @OliverBryant in #4595
New Contributors
- @ace-xc made their first contribution in #4577
- @GaoLeiA made their first contribution in #4569
- @lazariv made their first contribution in #4602
Full Changelog: v2.0.0...v2.1.0
v2.0.0
What's new in 2.0.0 (2026-01-31)
These are the changes in inference v2.0.0.
New features
- FEAT: add video gguf cache_manager.py by @OliverBryant in #4462
- FEAT: [model] Qwen3-VL-Embedding-2B support by @OliverBryant in #4469
- FEAT: [UI] move featured to backend API data-driven; remove frontend hardcoding. by @yiboyasss in #4466
- FEAT: [model] Qwen3-VL-Reranker-8B support by @OliverBryant in #4472
- FEAT: llm cache config in model json to skip unnecessary downloads by @OliverBryant in #4480
- FEAT: [UI] add official website and model hub links. by @yiboyasss in #4493
- FEAT: add custom llm models config json analysis by @OliverBryant in #4478
- FEAT: [model] MinerU2.5-2509-1.2B support by @leslie2046 in #4510
- FEAT: Introduce MinerU 2.5 OCR model. by @leslie2046 in #4511
- FEAT: add chat_template.jinja support by @OliverBryant in #4526
- FEAT: support engines for virtualenv by @OliverBryant in #4497
- FEAT: [model] Z-Image support by @OliverBryant in #4546
- FEAT: [model] GLM-4.6 support by @Jun-Howie in #4525
- FEAT: [model] Qwen3-VL-Embedding-8B support by @OliverBryant in #4470
- FEAT: [UI] use browser locale as default language. by @yiboyasss in #4539
- FEAT: [model] Qwen3-VL-Reranker-2B support by @OliverBryant in #4471
Enhancements
- ENH: update 3 models JSON ("HunyuanVideo", "gme-Qwen2-VL-7B-Instruct", "gme-Qwen2-VL-2B-Instruct") by @OliverBryant in #4464
- ENH: update models JSON [embedding, image, llm, video] by @XprobeBot in #4463
- ENH: update models JSON [llm] by @XprobeBot in #4490
- ENH: update model "Fun-ASR-Nano-2512" JSON by @leslie2046 in #4496
- ENH: update model "Fun-ASR-MLT-Nano-2512" JSON by @leslie2046 in #4498
- ENH: update model "Qwen3-VL-Embedding-2B" JSON by @OliverBryant in #4503
- ENH: update models JSON [embedding, image, llm, rerank] by @XprobeBot in #4524
- ENH: update models JSON [embedding, image, llm, rerank] by @XprobeBot in #4534
- ENH: update model "Qwen3-VL-Embedding-2B" JSON by @OliverBryant in #4552
- BLD: remove Dockerfile for version CU12.4 by @zwt-1234 in #4487
- REF: [UI] remove featureModels array. by @yiboyasss in #4488
Bug fixes
- BUG: fix has_musa_device error by @OliverBryant in #4477
- BUG: [xavier] fix xavier hash function to ensure prefix cache hit by @llyycchhee in #4482
- BUG: image/audio/video download hub exclude modelscope by @OliverBryant in #4483
- BUG: [UI] historical parameter backfill bug. by @yiboyasss in #4479
- BUG: deepseek ocr markdown bug by @OliverBryant in #4491
- BUG: new vllm version cannot launch embedding models by @OliverBryant in #4489
- BUG: Failed to download model 'Fun-ASR-MLT-Nano-2512' after multiple retries by @leslie2046 in #4537
- BUG: transformers version < 5.0.0 by @OliverBryant in #4553
- BUG: cachemanager makedirs only init once to prevent from stuck when downloading by @llyycchhee in #4551
Documentation
- DOC: add v1.17.0 release note by @qinxuye in #4467
- DOC: add limitations for Xavier by @ZhikaiGuo960110 in #4486
- DOC: add v2.0 doc by @OliverBryant in #4545
- DOC: add cudnn/nccl/cusparselt error solution in virtualenv's doc by @OliverBryant in #4556
Others
- feat:Upgrade the vllm base image to version 0.13.0 by @zwt-1234 in #4522
- CHORE: modify copyright by @OliverBryant in #4494
Full Changelog: v1.17.0...v2.0.0
v1.17.1
v1.17.1 is a hotfix version of v1.17.0
Full Changelog: v1.17.0...v1.17.1
v1.17.0
What's new in 1.17.0 (2026-01-10)
These are the changes in inference v1.17.0.
New features
- FEAT: add enable_thinking kwarg support by @OliverBryant in #4423
- FEAT: Support MThreads (MUSA) GPU by @yeahdongcn in #4425
- FEAT: support distributed model launch for vllm version>=v0.11.0 by @OliverBryant in #4428
- FEAT: [model] Qwen-Image-Edit-2511 support by @OliverBryant in #4427
- FEAT: add minimax tool call support by @OliverBryant in #4434
- FEAT: [model] Qwen-Image-2512 support by @OliverBryant in #4435
- FEAT: support auto batch for sentence_transformers rerank by @llyycchhee in #4429
- FEAT: add multi engines for ocr && deepseek ocr mlx support by @OliverBryant in #4437
- FEAT: add fp4 support by @OliverBryant in #4450
- FEAT: add video gguf support by @OliverBryant in #4458
- FEAT: add multi engines for image model by @OliverBryant in #4446
Enhancements
- ENH: update 4 models JSON ("Deepseek-V3.1", "deepseek-r1-0528", "deepseek-r1-0528-qwen3", ... +1 more) by @OliverBryant in #4445
- ENH: update model "DeepSeek-OCR" JSON by @OliverBryant in #4444
- ENH: support vllm mtp & rope scaling by @ZhikaiGuo960110 in #4454
Bug fixes
- BUG: fix empty cache for vllm embedding & rerank by @ZhikaiGuo960110 in #4422
- BUG: Selecting the same worker repeatedly by @OliverBryant in #4447
- BUG: fix vllm ocr model cannot stop by @OliverBryant in #4460
- BUG: Models being downloaded cannot be canceled. by @OliverBryant in #4461
Documentation
- DOC: update new models and release notes for v1.16.0 by @qinxuye in #4416
- DOC: update docker docs by @qinxuye in #4419
- DOC: vLLM + Torch + Xinference Compatibility Issue by @qiulang in #4442
New Contributors
- @yeahdongcn made their first contribution in #4425
Full Changelog: v1.16.0...v1.17.0
v1.16.0
What's new in 1.16.0 (2025-12-27)
These are the changes in inference v1.16.0.
New features
- FEAT: [model] DeepSeek-V3.2-Exp support by @Jun-Howie in #4374
- FEAT:Add vLLM backend support for DeepSeek-V3.2 by @Jun-Howie in #4377
- FEAT:Add vLLM backend support for DeepSeek-V3.2-Exp. by @Jun-Howie in #4375
- FEAT: vacc support by @ZhikaiGuo960110 in #4382
- FEAT: support vlm for vacc by @ZhikaiGuo960110 in #4385
- FEAT: [model] Fun-ASR-Nano-2512 support by @leslie2046 in #4397
- FEAT: [model] Qwen-Image-Layered support by @OliverBryant in #4395
- FEAT: [model] Fun-ASR-MLT-Nano-2512 support by @leslie2046 in #4398
- FEAT: continuous batching support for MLX chat models by @qinxuye in #4403
- FEAT: Add the architectures field for llm model launch by @OliverBryant in #4405
- FEAT: [UI] image models support configuration via environment variables and custom parameters. by @yiboyasss in #4413
- FEAT: support rerank async batch by @llyycchhee in #4414
- FEAT:Support VLLM backend for MiniMaxM2ForCausalLM by @Jun-Howie in #4412
Enhancements
- ENH: fix assigning replica to make gpu idxes assigned continuous by @ZhikaiGuo960110 in #4370
- ENH: update model "DeepSeek-V3.2" JSON by @Jun-Howie in #4381
- ENH: update model "glm-4.5" JSON by @OliverBryant in #4383
- ENH: update 2 models JSON ("glm-4.1v-thinking", "glm-4.5v") by @OliverBryant in #4384
- ENH: support torchaudio 2.9.0 by @llyycchhee in #4390
- ENH: update 3 models JSON ("llama-2-chat", "llama-3", "llama-3-instruct") by @OliverBryant in #4400
- ENH: update 4 models JSON ("llama-3.1", "llama-3.1-instruct", "llama-3.2-vision-instruct", ... +1 more) by @OliverBryant in #4401
- ENH: update model "jina-embeddings-v3" JSON by @XprobeBot in #4404
- ENH: update models JSON [audio, embedding, image, llm, video] by @XprobeBot in #4407
- ENH: update models JSON [audio, image] by @XprobeBot in #4408
- ENH: update model "Z-Image-Turbo" JSON by @OliverBryant in #4409
- ENH: update 2 models JSON ("DeepSeek-V3.2", "DeepSeek-V3.2-Exp") by @Jun-Howie in #4392
- ENH: update models JSON [llm] by @XprobeBot in #4415
- BLD: remove python 3.9 support by @OliverBryant in #4387
- BLD: Update Dockerfile to 12.9 to use VLLM v0.11.2 version by @zwt-1234 in #4393
Bug fixes
- BUG: fix PaddleOCR-VL output by @leslie2046 in #4368
- BUG: custom embedding and rerank model analysis error by @OliverBryant in #4367
- BUG: cannot launch model on cpu && multi workers launch error by @OliverBryant in #4361
- BUG: OCR API return is null && add doc for how to modify model_size by @OliverBryant in #4331
- BUG: fix n_gpu parameter by @OliverBryant in #4411
Documentation
Full Changelog: v1.15.0...v1.16.0
v1.15.0
What's new in 1.15.0 (2025-12-13)
These are the changes in inference v1.15.0.
New features
- FEAT: added more detailed instructions for engine unavailability. by @OliverBryant in #4308
- FEAT: [model] Z-Image-Turbo support by @OliverBryant in #4333
- FEAT: [model] DeepSeek-V3.2 support by @Jun-Howie in #4344
- FEAT: [model] PaddleOCR-VL support by @leslie2046 in #4354
- FEAT: add llama_cpp json schema output by @OliverBryant in #4282
- FEAT: PaddleOCR-VL implementation by @leslie2046 in #4304
- FEAT: multi replicas on a single GPU && add launch strategy by @OliverBryant in #4358
Enhancements
- ENH: update models JSON [llm] by @XprobeBot in #4343
- ENH: update model "MiniMax-M2" JSON by @XprobeBot in #4342
- ENH: update models JSON [llm] by @XprobeBot in #4349
- ENH: support lauching with --device cpu by @hubutui in #4352
- ENH: add glm-4.5 tool calls support && vllm StructuredOutputsParams support by @OliverBryant in #4357
Bug fixes
- BUG: fix manage cache models missing by @OliverBryant in #4329
- BUG: [llm, vllm]: support ignore eos by @ZhikaiGuo960110 in #4332
- BUG: Multimodal settings for video parameters are not taking effect. by @OliverBryant in #4338
- BUG: Soft links cannot be completely deleted by @OliverBryant in #4337
- BUG: Packages with identical names in virtual environments error by @OliverBryant in #4348
- BUG: Fix typo in xinference/deploy/docker/Dockerfile.cu128 by @hubutui in #4350
- BUG: custom embedding model register fail by @OliverBryant in #4335
- BUG: [UI] fix the bug in the copy function. by @yiboyasss in #4355
- BUG: [UI] control Select dropdown width to prevent it from becoming too wide. by @yiboyasss in #4356
Documentation
Others
- Fixed- workflow Vulnerability by @barakharyati in #4328
- CHORE: add i18n for replica details by @leslie2046 in #4306
New Contributors
- @barakharyati made their first contribution in #4328
- @ZhikaiGuo960110 made their first contribution in #4332
- @hubutui made their first contribution in #4350
Full Changelog: v1.14.0...v1.15.0
v1.14.0
What's new in 1.14.0 (2025-11-30)
These are the changes in inference v1.14.0.
New features
- FEAT: add vLLM 0.11.1+ compatibility with v1 executor support by @amumu96 in #4252
- FEAT: [virtualenv] New v3 spec and list/delete virtual env APIs by @OliverBryant in #4254
- FEAT: [model] HunyuanOCR support by @OliverBryant in #4290
- FEAT: Add support of rerank model for llamacpp by @harryzwh in #4227
- FEAT: show reason why engines not available by @OliverBryant in #4261
- FEAT: Parallel startup model, add tooltips for startup progress, and p… by @leslie2046 in #4268
Enhancements
- BLD: fix model ui launch error with gradio 6.x by @OliverBryant in #4289
- BLD: add pr auto run gen_docs workflow. by @yiboyasss in #4260
- BLD: gen docs pr modify by @OliverBryant in #4294
- BLD: gen doc modify v2 by @OliverBryant in #4296
- BLD: gen docs pr modify v3 by @OliverBryant in #4297
- BLD: auto-run gen_docs.py from doc/source by @yiboyasss in #4300
- BLD: remove [skip ci] from auto docs commit by @yiboyasss in #4301
Bug fixes
- BUG: Compat with xllamacpp 0.2.5+ by @codingl2k1 in #4270
- BUG: add download_hubs for cluster by @OliverBryant in #4273
- BUG: sometimes cannot select gpu in CPU and GPU hybrid cluster by @leslie2046 in #4280
Documentation
Others
- CHORE: expand stale and close time by @qinxuye in #4253
- chore: sync models JSON [audio, embedding, image, llm, rerank, video] by @XprobeBot in #4258
- chore: sync models JSON [llm] by @XprobeBot in #4272
- chore: sync model "Qwen3-Reranker-0.6B" JSON by @OliverBryant in #4277
- chore: sync model "bge-reranker-v2-m3" JSON by @OliverBryant in #4276
- chore: sync model "Qwen3-Reranker-4B" JSON by @OliverBryant in #4278
- chore: sync model "Qwen3-Reranker-8B" JSON by @OliverBryant in #4279
- chore: sync model "qwen3" JSON by @XprobeBot in #4287
- chore: sync models JSON [rerank] by @XprobeBot in #4284
- chore: sync model "FLUX.1-dev" JSON by @OliverBryant in #4293
- chore: sync model "FLUX.2-dev" JSON by @OliverBryant in #4292
- chore: sync models JSON [image] by @XprobeBot in #4303
Full Changelog: v1.13.0...v1.14.0
v1.13.0
What's new in 1.13.0 (2025-11-15)
These are the changes in inference v1.13.0.
New features
- FEAT: [model] Qwen3-VL-MLX support by @OliverBryant in #4203
- FEAT: auto batch embedding by @qinxuye in #4197
- FEAT: update models via Xinference model hub by @OliverBryant in #4241
Enhancements
- ENH: IndexTTS2 stream output by @OliverBryant in #4213
- ENH: IndexTTS2 offline deploy by @OliverBryant in #4202
- ENH: add embedding benchmark by @llyycchhee in #4244
- BLD: Fix CI error caused by peft version by @OliverBryant in #4249
Bug fixes
- BUG: Deepseek-OCR error in docker by @OliverBryant in #4208
- BUG: ensure unique tool call IDs using UUID by @amumu96 in #4242
- BUG: Fix cache model not shown on audio、video and image by @OliverBryant in #4247
Documentation
- DOC: added new models by @qinxuye in #4206
- DOC: Xinference 1.12.0 installation issues with uv by @qiulang in #4228
- DOC: add model update documentation. by @yiboyasss in #4246
Others
- chore: sync models JSON [audio, embedding, image, llm, rerank, video] by @XprobeBot in #4214
- chore: sync models JSON [audio, embedding, image, llm, rerank, video] by @XprobeBot in #4226
- chore: sync models JSON [audio] by @XprobeBot in #4243
Full Changelog: v1.12.0...v1.13.0
v1.12.0
What's new in 1.12.0 (2025-11-02)
These are the changes in inference v1.12.0.
New features
- FEAT: [model] support jina-reranker-v3 by @llyycchhee in #4156
- FEAT: [model] qwen3-omni by @qinxuye in #4137
- FEAT: xinference python 3.13 support by @OliverBryant in #4164
- FEAT: add OCR gradio UI by @OliverBryant in #4185
- FEAT: [model] DeepSeek-OCR by @OliverBryant in #4187
Enhancements
- ENH: adding lightning support for qwen-image-edit-2509 by @qinxuye in #4151
- BLD: torchaudio 2.9 introduces the breaking change in torchaudio.save by @qiulang in #4178
- BLD: fix setup.cfg for python 3.12 and fix dockerfile by @zwt-1234 in #4192
- BLD: fix Dockerfile.cpu by @zwt-1234 in #4195
- REF: Modified the batch lock logic by @OliverBryant in #4162
- BLD: fix transformers version in cu128 dockerfile by @zwt-1234 in #4152
Bug fixes
- BUG: repair qwen3 model transformers random characters by @OliverBryant in #4148
- BUG: [UI] resolve progress bar display issue. by @yiboyasss in #4150
- BUG: fix IndexTTS2 on transformes 4.57.1 by @OliverBryant in #4158
- BUG: fix error when xinference run on docker with oath2 by @OliverBryant in #4161
- BUG: fix qwen3-vl launch error by @amumu96 in #4190
Documentation
- DOC: add release notes doc by @qinxuye in #4157
- DOC: Add PyPI mirror configuration guide for audio package installation by @qiulang in #4177
Others
- chore: sync models JSON [image, llm] by @XprobeBot in #4149
- chore: sync models JSON [rerank] by @XprobeBot in #4159
- chore: sync models JSON [llm] by @XprobeBot in #4160
- chore: sync models JSON [llm] by @XprobeBot in #4171
- chore: sync models JSON [image] by @XprobeBot in #4186
- chore: sync models JSON [embedding, image] by @XprobeBot in #4188
- chore: sync models JSON [llm] by @XprobeBot in #4191
Full Changelog: v1.11.0...v1.12.0
v1.11.0.post1
What's new in 1.11.0.post1 (2025-10-20)
These are the changes in inference v1.11.0.post1.
Bug fixes
- BUG: repair qwen3 model transformers random characters by @OliverBryant in #4148
- BUG: [UI] resolve progress bar display issue. by @yiboyasss in #4150
Others
Full Changelog: v1.11.0...v1.11.0.post1