Skip to content

Releases: xorbitsai/inference

v1.7.1.post1

30 Jun 11:28
84f10dc

Choose a tag to compare

What's new in 1.7.1.post1 (2025-06-30)

These are the changes in inference v1.7.1.post1.

Enhancements

  • BLD: pin transformers version at 4.52.4 to fix "Failed to import module 'SentenceTransformer'" error by @amumu96 in #3743

Full Changelog: v1.7.1...v1.7.1.post1

v1.7.1

27 Jun 12:17
cf64a86

Choose a tag to compare

What's new in 1.7.1 (2025-06-27)

These are the changes in inference v1.7.1.

New features

Enhancements

  • ENH: add enable_flash_attn param for loading qwen3 embedding & rerank by @qinxuye in #3640
  • ENH: add more abilities for builtin model families API by @qinxuye in #3658
  • ENH: improve local cluster startup reliability via child-process readiness signaling by @Checkmate544 in #3642
  • ENH: FishSpeech support pcm by @codingl2k1 in #3680
  • ENH: Add 4-sample micro-batching to Qwen-3 reranker to reduce GPU memory by @yasu-oh in #3666
  • ENH: Limit default n_parallel for llama.cpp backend by @codingl2k1 in #3712
  • BLD: pin flash-attn & flashinfer-python version and limit sgl-kernel version by @amumu96 in #3669
  • BLD: Update Dockerfile by @XiaoXiaoJiangYun in #3695
  • REF: remove unused code by @qinxuye in #3664

Bug fixes

  • BUG: fix TTS error bug :No such file or directory by @robin12jbj in #3625
  • BUG: Fix max_tokens value in Qwen3 Reranker by @yasu-oh in #3665
  • BUG: fix custom embedding by @qinxuye in #3677
  • BUG: [UI] rename the command-line argument from download-hub to download_hub. by @yiboyasss in #3685
  • BUG: fix jina-clip-v2 for text only or image only by @qinxuye in #3690
  • BUG: internvl chat error using vllm engine by @amumu96 in #3722
  • BUG: fix the parsing logic of streaming tool calls by @amumu96 in #3721
  • BUG: fix <think> wrongly added when set chat_template_kwargs {"enable_thinking": False} by @qinxuye in #3718

Documentation

New Contributors

Full Changelog: v1.7.0...v1.7.1

v1.7.0.post1

13 Jun 17:23
da2040e

Choose a tag to compare

What's new in 1.7.0.post1 (2025-06-13)

These are the changes in inference v1.7.0.post1.

Bug fixes

Full Changelog: v1.7.0...v1.7.0.post1

v1.7.0

13 Jun 10:58
a362dba

Choose a tag to compare

What's new in 1.7.0 (2025-06-13)

These are the changes in inference v1.7.0.

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Full Changelog: v1.6.1...v1.7.0

v1.6.1

30 May 11:41
72cc5e3

Choose a tag to compare

What's new in 1.6.1 (2025-05-30)

These are the changes in inference v1.6.1.

New features

Enhancements

Bug fixes

Documentation

  • DOC: remove llama-cpp-python related doc & refine model_ability parts by @qinxuye in #3519
  • DOC: Update doc about cosyvoice-2.0 stream and auto NGL by @codingl2k1 in #3547

New Contributors

Full Changelog: v1.6.0...v1.6.1

v1.6.0.post1

17 May 07:20
1adc5d3

Choose a tag to compare

What's new in 1.6.0.post1 (2025-05-17)

These are the changes in inference v1.6.0.post1.

Enhancements

Full Changelog: v1.6.0...v1.6.0.post1

v1.6.0

16 May 12:27
81a24f4

Choose a tag to compare

What's new in 1.6.0 (2025-05-16)

These are the changes in inference v1.6.0.

New features

Enhancements

Bug fixes

  • BUG: fix qwen3 235b spec by @qinxuye in #3375
  • BUG: fix incomplete parsing of reasoning content in reasoning_parser by @amumu96 in #3391
  • BUG: fix the processing logic for inference content parsing and tool calls by @amumu96 in #3394
  • BUG: fix stop word handling logic in vllm model generation configuration by @amumu96 in #3414
  • BUG: fix Model._get_full_prompt() takes 3 positional arguments but 4 were given by @qinxuye in #3417
  • BUG: fix potential stop hang by @qinxuye in #3434
  • BUG: [UI] Added cpu_offload parameter to video model and fixed bug in audio model's filtering function. by @yiboyasss in #3461

New Contributors

Full Changelog: v1.5.1...v1.6.0

v1.5.1

30 Apr 14:00
1c11c60

Choose a tag to compare

What's new in 1.5.1 (2025-04-30)

These are the changes in inference v1.5.1.

New features

Enhancements

Bug fixes

  • BUG: fix sglang chat by @qinxuye in #3326
  • BUG: Show engine options on UI even if the specific engine is not installed by @ChengjieLi28 in #3331
  • BUG: fix failure of clearing resources when loading model failed by @qinxuye in #3361

Documentation

  • DOC: update troubleshooting.rst for the launch error caused by numpy by @qiulang in #3342

New Contributors

Full Changelog: v1.5.0...v1.5.1

v1.5.0.post2

21 Apr 11:11

Choose a tag to compare

What's new in 1.5.0.post2 (2025-04-21)

These are the changes in xorbitsai/inference v1.5.0.post2.

Enhancements

Bug fixes

Full Changelog: v1.5.0.post1...v1.5.0.post2

v1.5.0.post1

19 Apr 15:58
2010508

Choose a tag to compare

What's new in 1.5.0.post1 (2025-04-19)

These are the changes in inference v1.5.0.post1.

Enhancements

Documentation

Full Changelog: v1.5.0...v1.5.0.post1