30 Jun 11:28

XprobeBot

84f10dc

v1.7.1.post1

What's new in 1.7.1.post1 (2025-06-30)

These are the changes in inference v1.7.1.post1.

Enhancements

BLD: pin transformers version at 4.52.4 to fix "Failed to import module 'SentenceTransformer'" error by @amumu96 in #3743

Full Changelog: v1.7.1...v1.7.1.post1

Contributors

amumu96

Assets 2

27 Jun 12:17

XprobeBot

v1.7.1

cf64a86

v1.7.1

What's new in 1.7.1 (2025-06-27)

These are the changes in inference v1.7.1.

New features

FEAT: [UI] enhance audio & rerank model registration params. by @yiboyasss in #3656
FEAT: support async client by @zhcn000000 in #3645
FEAT: [UI] add max_tokens display in rerank model. by @yiboyasss in #3671
FEAT: [UI] add model_ability options for LLM registration. by @yiboyasss in #3663
FEAT: support qwenLong-l1 by @Jun-Howie in #3691
FEAT: [UI] model registration supports packages. by @yiboyasss in #3702
FEAT: support MLU device by @nan9126 in #3693
FEAT: vllm v1 auto enabling by @qinxuye in #3637
FEAT: distributed inference for MLX by @qinxuye in #3700

Enhancements

ENH: add enable_flash_attn param for loading qwen3 embedding & rerank by @qinxuye in #3640
ENH: add more abilities for builtin model families API by @qinxuye in #3658
ENH: improve local cluster startup reliability via child-process readiness signaling by @Checkmate544 in #3642
ENH: FishSpeech support pcm by @codingl2k1 in #3680
ENH: Add 4-sample micro-batching to Qwen-3 reranker to reduce GPU memory by @yasu-oh in #3666
ENH: Limit default n_parallel for llama.cpp backend by @codingl2k1 in #3712
BLD: pin flash-attn & flashinfer-python version and limit sgl-kernel version by @amumu96 in #3669
BLD: Update Dockerfile by @XiaoXiaoJiangYun in #3695
REF: remove unused code by @qinxuye in #3664

Bug fixes

BUG: fix TTS error bug :No such file or directory by @robin12jbj in #3625
BUG: Fix max_tokens value in Qwen3 Reranker by @yasu-oh in #3665
BUG: fix custom embedding by @qinxuye in #3677
BUG: [UI] rename the command-line argument from download-hub to download_hub. by @yiboyasss in #3685
BUG: fix jina-clip-v2 for text only or image only by @qinxuye in #3690
BUG: internvl chat error using vllm engine by @amumu96 in #3722
BUG: fix the parsing logic of streaming tool calls by @amumu96 in #3721
BUG: fix <think> wrongly added when set chat_template_kwargs {"enable_thinking": False} by @qinxuye in #3718

Documentation

DOC: add doc for paraformer by @leslie2046 in #3631
DOC: Flexible model (traditional ML models) by @qinxuye in #3714

New Contributors

@robin12jbj made their first contribution in #3625
@zhcn000000 made their first contribution in #3645
@yasu-oh made their first contribution in #3665
@Checkmate544 made their first contribution in #3642
@nan9126 made their first contribution in #3693
@XiaoXiaoJiangYun made their first contribution in #3695

Full Changelog: v1.7.0...v1.7.1

Contributors

qinxuye, leslie2046, and 10 other contributors

Assets 2

13 Jun 17:23

XprobeBot

v1.7.0.post1

da2040e

v1.7.0.post1

What's new in 1.7.0.post1 (2025-06-13)

These are the changes in inference v1.7.0.post1.

Bug fixes

BUG: fix qwen3-rerank to create model on GPU by @qinxuye in #3630
BUG: fix mincpm4 modeling by @Jun-Howie in #3632

Full Changelog: v1.7.0...v1.7.0.post1

Contributors

qinxuye and Jun-Howie

Assets 2

13 Jun 10:58

XprobeBot

v1.7.0

a362dba

v1.7.0

What's new in 1.7.0 (2025-06-13)

These are the changes in inference v1.7.0.

New features

FEAT: support CogView4 image model by @qinxuye in #3557
FEAT: [UI] support model_ability filter for image and video models. by @yiboyasss in #3563
FEAT: [UI] auto-switch to active tab when Running Models page loads. by @yiboyasss in #3568
FEAT: support first-last-frame to video by @qinxuye in #3555
FEAT: [UI] add Japanese and Korean language support. by @yiboyasss in #3574
FEAT: SeACoParaformer model by @leslie2046 in #3587
FEAT: support verbose_json for funasr family audio2text models by @leslie2046 in #3591
FEAT: support deepseek-r1-0528 Mixed quantization by @Jun-Howie in #3601
FEAT: support engines for embedding models by @pengjunfeng11 in #2791
FEAT:support MiniCPM4 Series by @Jun-Howie in #3609
FEAT: [UI] add model_engine parameter to embedding model. by @yiboyasss in #3617
FEAT: add kwargs for transripts client API by @leslie2046 in #3622
FEAT: support qwen3 embedding by @qinxuye in #3615
FEAT: support qwen3-reranker by @qinxuye in #3627

Enhancements

ENH: Support pcm response_format by @codingl2k1 in #3606

Bug fixes

BUG: Fix dependency by @codingl2k1 in #3566
BUG: Fix cmdline by @codingl2k1 in #3589
BUG: fix potential hang for sglang by @qinxuye in #3597
BUG: [UI] fixed the mobile language switching bug. by @yiboyasss in #3608
BUG: Fix the error when using Qwen function call with Spring AI. by @aniya105 in #3614

Documentation

DOC: update links by @qinxuye in #3565
DOC: Update CosyVoice doc by @codingl2k1 in #3605
DOC: update models by @qinxuye in #3628

Others

FIX: [UI] fix model_engine parameter bug. by @yiboyasss in #3620

New Contributors

@aniya105 made their first contribution in #3614

Full Changelog: v1.6.1...v1.7.0

Contributors

qinxuye, leslie2046, and 5 other contributors

Assets 2

30 May 11:41

XprobeBot

v1.6.1

72cc5e3

v1.6.1

What's new in 1.6.1 (2025-05-30)

These are the changes in inference v1.6.1.

New features

FEAT: llama.cpp backend support multimodal by @codingl2k1 in #3442
FEAT: Auto ngl for llama.cpp backend by @codingl2k1 in #3518
FEAT: [UI] add hint for common parameters with support for custom input. by @yiboyasss in #3521
FEAT: add some other paraformer series models by @leslie2046 in #3536
FEAT: support Deepseek-R1-0528 by @Jun-Howie in #3539
FEAT: support deepseek-r1-0528-qwen3 by @Jun-Howie in #3552

Enhancements

ENH: [rerank] add instruction for minicpm-reranker by @llyycchhee in #3453
ENH: pass extra arguments for speech2text API. by @leslie2046 in #3516
ENH: add modelscope support for kolors by @qinxuye in #3534
ENH: remove check when specified GPU index for vllm by @kota-iizuka in #3527
ENH: Supports HybridCache in transformers lib, mainly for gemma3 chat model by @ChengjieLi28 in #3538
ENH: support virtualenv for chattts by @qinxuye in #3541
BLD: fix setup.cfg by @qinxuye in #3467
BLD: update flashinfer version by @amumu96 in #3549
REF: Refactor for multimodal llm models by @ChengjieLi28 in #3462

Bug fixes

BUG: fix input for jina clip by @llyycchhee in #3440
BUG: [ui] delete cache file white screen bug. by @yiboyasss in #3482
BUG: fix import_submodules, ignore test files by @Gmgge in #3545

Documentation

DOC: remove llama-cpp-python related doc & refine model_ability parts by @qinxuye in #3519
DOC: Update doc about cosyvoice-2.0 stream and auto NGL by @codingl2k1 in #3547

New Contributors

@kota-iizuka made their first contribution in #3527

Full Changelog: v1.6.0...v1.6.1

Contributors

qinxuye, leslie2046, and 8 other contributors

Assets 2

17 May 07:20

XprobeBot

v1.6.0.post1

1adc5d3

v1.6.0.post1

What's new in 1.6.0.post1 (2025-05-17)

These are the changes in inference v1.6.0.post1.

Enhancements

BLD: fix setup.cfg by @qinxuye in #3467

Full Changelog: v1.6.0...v1.6.0.post1

Contributors

qinxuye

Assets 2

16 May 12:27

XprobeBot

v1.6.0

81a24f4

v1.6.0

What's new in 1.6.0 (2025-05-16)

These are the changes in inference v1.6.0.

New features

FEAT: [MODEL]XiYanSQL-QwenCoder-2504 by @Minamiyama in #3352
FEAT: [Model]HuatuoGPT-o1 by @Minamiyama in #3353
FEAT: [Model]DianJin-R1 by @Minamiyama in #3343
FEAT: support image_to_video by @qinxuye in #3386
FEAT: Qwen3-235B-A22B GPTQ Quantization Int4 Int8 by @Jun-Howie in #3422
FEAT: use xo.wait_for instead of asyncio.wait_for for actor call by @qinxuye in #3439
FEAT: video UI by @qinxuye in #3448
FEAT: auto add tag when it is missed by @amumu96 in #3456
FEAT: Support Skywork-OR1 by @Jun-Howie in #3447
FEAT: audio UI by @qinxuye in #3457
FEAT: Support Skywork-OR1 gptq for 32B by @Jun-Howie in #3464
FEAT: support enable_thinking for loading qwen3 by @qinxuye in #3463

Enhancements

ENH: Qwen/Qwen2.5-Omni-3B by @Minamiyama in #3366
ENH: added mlx format for qwen3 & update docs by @qinxuye in #3369
ENH: Qwen3-AWQ for 14B & 32B by @Minamiyama in #3370
ENH: Update the activated_steze_In_billions parameter in the deepseek-vl2 model by @Jun-Howie in #3380
ENH: add mlx-community/Qwen2.5-VL-32B-Instruct by @xiaohan815 in #3405
ENH: [UI] add a documentation link button in side menu by @Minamiyama in #3411
ENH: QwQ use unsloth gguf by @codingl2k1 in #3408
ENH: llama.cpp backend use xllamacpp by @codingl2k1 in #3412
ENH: [UI] display version info in side menu by @Minamiyama in #3423
ENH: Worker env isolation by @codingl2k1 in #3362
ENH: Use Qwen's official quantitative model repository by @Jun-Howie in #3436
ENH: Update cosyvoice by @codingl2k1 in #3365
BLD: isolate autoawq and GPTQModel into separate extra install by @qinxuye in #3397
BLD: pin transformers version at 4.51.3 by @amumu96 in #3431
REF: support loading model config in function by @Minamiyama in #3428

Bug fixes

BUG: fix qwen3 235b spec by @qinxuye in #3375
BUG: fix incomplete parsing of reasoning content in reasoning_parser by @amumu96 in #3391
BUG: fix the processing logic for inference content parsing and tool calls by @amumu96 in #3394
BUG: fix stop word handling logic in vllm model generation configuration by @amumu96 in #3414
BUG: fix Model._get_full_prompt() takes 3 positional arguments but 4 were given by @qinxuye in #3417
BUG: fix potential stop hang by @qinxuye in #3434
BUG: [UI] Added cpu_offload parameter to video model and fixed bug in audio model's filtering function. by @yiboyasss in #3461

New Contributors

@xiaohan815 made their first contribution in #3405

Full Changelog: v1.5.1...v1.6.0

Contributors

qinxuye, Minamiyama, and 5 other contributors

Assets 2

30 Apr 14:00

XprobeBot

v1.5.1

1c11c60

v1.5.1

What's new in 1.5.1 (2025-04-30)

These are the changes in inference v1.5.1.

New features

FEAT: Wan 2.1 text2video by @qinxuye in #3297
FEAT: [UI] highlight the input box content. by @yiboyasss in #3306
FEAT: [UI] display the model_ability parameter. by @yiboyasss in #3308
FEAT: add ggufv2 support for vLLM by @harryzwh in #3259
FEAT: ovis2 by @Minamiyama in #3170
FEAT: support Qwen3 and Qwen3MOE by @Jun-Howie in #3347
FEAT: Add support for Qwen3 GPTQ quantization format by @Jun-Howie in #3363

Enhancements

ENH: support setting sse ping attempts by @llyycchhee in #3313
ENH: Support GLM4-0414 MLX and GGUF by @Jun-Howie in #3325
ENH: optimize qwen3, support chat_template_kwargs for all engines by @qinxuye in #3354
REF: Drop internal compression logic for transformers quantization, using bnb config instead by @ChengjieLi28 in #3324
REF: Unify audio model abilities by @llyycchhee in #3351

Bug fixes

BUG: fix sglang chat by @qinxuye in #3326
BUG: Show engine options on UI even if the specific engine is not installed by @ChengjieLi28 in #3331
BUG: fix failure of clearing resources when loading model failed by @qinxuye in #3361

Documentation

DOC: update troubleshooting.rst for the launch error caused by numpy by @qiulang in #3342

New Contributors

@llyycchhee made their first contribution in #3313
@harryzwh made their first contribution in #3259
@qiulang made their first contribution in #3342

Full Changelog: v1.5.0...v1.5.1

Contributors

qinxuye, qiulang, and 6 other contributors

Assets 2

21 Apr 11:11

qinxuye

v1.5.0.post2

a5d4be9

v1.5.0.post2

What's new in 1.5.0.post2 (2025-04-21)

These are the changes in xorbitsai/inference v1.5.0.post2.

Enhancements

BLD: support flash-attn at Dockerfile by @amumu96 in #3311

Bug fixes

BUG: [UI] fix the bug in the cancellation function. by @yiboyasss in #3301
BUG: fix gemma-3-it max_tokens by @qinxuye in #3304
BUG: fix potential progress error by @qinxuye in #3305

Full Changelog: v1.5.0.post1...v1.5.0.post2

Contributors

qinxuye, amumu96, and yiboyasss

Assets 2

19 Apr 15:58

XprobeBot

v1.5.0.post1

2010508

v1.5.0.post1

What's new in 1.5.0.post1 (2025-04-19)

These are the changes in inference v1.5.0.post1.

Enhancements

BLD: fix cpu docker build by @amumu96 in #3296

Documentation

DOC: small fixes for doc by @qinxuye in #3294

Full Changelog: v1.5.0...v1.5.0.post1

Contributors

qinxuye and amumu96

Assets 2

Releases: xorbitsai/inference

v1.7.1.post1

What's new in 1.7.1.post1 (2025-06-30)

Enhancements

Contributors

Uh oh!

v1.7.1

What's new in 1.7.1 (2025-06-27)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

Uh oh!

v1.7.0.post1

What's new in 1.7.0.post1 (2025-06-13)

Bug fixes

Contributors

Uh oh!

v1.7.0

What's new in 1.7.0 (2025-06-13)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors

Uh oh!

v1.6.1

What's new in 1.6.1 (2025-05-30)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

Uh oh!

v1.6.0.post1

What's new in 1.6.0.post1 (2025-05-17)

Enhancements

Contributors

Uh oh!

v1.6.0

What's new in 1.6.0 (2025-05-16)

New features

Enhancements

Bug fixes

New Contributors

Contributors

Uh oh!

v1.5.1

What's new in 1.5.1 (2025-04-30)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

Uh oh!

v1.5.0.post2

What's new in 1.5.0.post2 (2025-04-21)

Enhancements

Bug fixes

Contributors

Uh oh!

v1.5.0.post1

What's new in 1.5.0.post1 (2025-04-19)

Enhancements

Documentation

Contributors

Uh oh!