Skip to content

Commit 8ac4e33

Browse files
qinxuyeOliverBryant
authored andcommitted
DOC: update new models (#4146)
1 parent 0287d9a commit 8ac4e33

22 files changed

+885
-35
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,14 +46,14 @@ potential of cutting-edge AI models.
4646
- Support SGLang backend: [#1161](https://github.com/xorbitsai/inference/pull/1161)
4747
- Support LoRA for LLM and image models: [#1080](https://github.com/xorbitsai/inference/pull/1080)
4848
### New Models
49+
- Built-in support for [minicpm-v-4.5](https://github.com/OpenBMB/MiniCPM-V): [#4136](https://github.com/xorbitsai/inference/pull/4136)
50+
- Built-in support for [Qwen3-VL](https://qwen.ai/blog?id=99f0335c4ad9ff6153e517418d48535ab6d8afef&from=research.latest-advancements-list): [#4112](https://github.com/xorbitsai/inference/pull/4112)
51+
- Built-in support for [Qwen3-Next](https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancements-list): [#4113](https://github.com/xorbitsai/inference/pull/4113)
4952
- Built-in support for [Deepseek-V3.1](https://api-docs.deepseek.com/news/news250821): [#4022](https://github.com/xorbitsai/inference/pull/4022)
5053
- Built-in support for [Qwen-Image-Edit](https://huggingface.co/Qwen/Qwen-Image-Edit): [#3989](https://github.com/xorbitsai/inference/pull/3989)
5154
- Built-in support for [Wan2.2](https://github.com/Wan-Video/Wan2.2): [#3996](https://github.com/xorbitsai/inference/pull/3996)
5255
- Built-in support for [seed-oss](https://github.com/ByteDance-Seed/seed-oss): [#4020](https://github.com/xorbitsai/inference/pull/4020)
5356
- Built-in support for [gpt-oss](https://openai.com/zh-Hans-CN/index/introducing-gpt-oss/): [#3924](https://github.com/xorbitsai/inference/pull/3924)
54-
- Built-in support for [GLM-4.5v](https://github.com/zai-org/GLM-V): [#3957](https://github.com/xorbitsai/inference/pull/3957)
55-
- Built-in support for [Qwen-Image](https://qwenlm.github.io/blog/qwen-image/): [#3916](https://github.com/xorbitsai/inference/pull/3916)
56-
- Built-in support for [GLM-4.5](https://github.com/zai-org/GLM-4.5): [#3882](https://github.com/xorbitsai/inference/pull/3882)
5757
### Integrations
5858
- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
5959
- [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization.

README_zh_CN.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,14 +43,14 @@ Xorbits Inference(Xinference)是一个性能强大且功能全面的分布
4343
- 支持 SGLang 后端: [#1161](https://github.com/xorbitsai/inference/pull/1161)
4444
- 支持LLM和图像模型的LoRA: [#1080](https://github.com/xorbitsai/inference/pull/1080)
4545
### 新模型
46+
- 内置 [minicpm-v-4.5](https://github.com/OpenBMB/MiniCPM-V): [#4136](https://github.com/xorbitsai/inference/pull/4136)
47+
- 内置 [Qwen3-VL](https://qwen.ai/blog?id=99f0335c4ad9ff6153e517418d48535ab6d8afef&from=research.latest-advancements-list): [#4112](https://github.com/xorbitsai/inference/pull/4112)
48+
- 内置 [Qwen3-Next](https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancements-list): [#4113](https://github.com/xorbitsai/inference/pull/4113)
4649
- 内置 [Deepseek-V3.1](https://api-docs.deepseek.com/news/news250821): [#4022](https://github.com/xorbitsai/inference/pull/4022)
4750
- 内置 [Qwen-Image-Edit](https://huggingface.co/Qwen/Qwen-Image-Edit): [#3989](https://github.com/xorbitsai/inference/pull/3989)
4851
- 内置 [Wan2.2](https://github.com/Wan-Video/Wan2.2): [#3996](https://github.com/xorbitsai/inference/pull/3996)
4952
- 内置 [seed-oss](https://github.com/ByteDance-Seed/seed-oss): [#4020](https://github.com/xorbitsai/inference/pull/4020)
5053
- 内置 [gpt-oss](https://openai.com/zh-Hans-CN/index/introducing-gpt-oss/): [#3924](https://github.com/xorbitsai/inference/pull/3924)
51-
- 内置 [GLM-4.5v](https://github.com/zai-org/GLM-V): [#3957](https://github.com/xorbitsai/inference/pull/3957)
52-
- 内置 [Qwen-Image](https://qwenlm.github.io/zh/blog/qwen-image/): [#3916](https://github.com/xorbitsai/inference/pull/3916)
53-
- 内置 [GLM-4.5](https://github.com/zai-org/GLM-4.5): [#3882](https://github.com/xorbitsai/inference/pull/3882)
5454
### 集成
5555
- [FastGPT](https://doc.fastai.site/docs/development/custom-models/xinference/):一个基于 LLM 大模型的开源 AI 知识库构建平台。提供了开箱即用的数据处理、模型调用、RAG 检索、可视化 AI 工作流编排等能力,帮助您轻松实现复杂的问答场景。
5656
- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): 一个涵盖了大型语言模型开发、部署、维护和优化的 LLMOps 平台。

doc/source/getting_started/installation.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -94,14 +94,15 @@ Currently, supported models include:
9494
- ``moonlight-16b-a3b-instruct``
9595
- ``qwenLong-l1``
9696
- ``qwen3``
97+
- ``Baichuan-M2``
9798
- ``minicpm4``
9899
- ``Ernie4.5``
99-
- ``Qwen3-Instruct``, ``Qwen3-Thinking``, ``Qwen3-Coder``
100+
- ``Qwen3-Instruct``, ``Qwen3-Thinking``, ``Qwen3-Coder``, ``Qwen3-Next-Instruct``, ``Qwen3-Next-Thinking``
100101
- ``Deepseek-V3.1``
101102
- ``glm-4.5``
102103
- ``KAT-V1``
103104
- ``gpt-oss``
104-
- ``seed-oss``, ``seed-oss``
105+
- ``seed-oss``
105106
.. vllm_end
106107
107108
To install Xinference and vLLM::

doc/source/models/builtin/audio/indextts2.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ IndexTTS2
66

77
- **Model Name:** IndexTTS2
88
- **Model Family:** IndexTTS2
9-
- **Abilities:** ['text2audio', 'text2audio_voice_cloning', 'text2audio_emotion_control']
9+
- **Abilities:** ['text2audio', 'text2audio_zero_shot', 'text2audio_voice_cloning', 'text2audio_emotion_control']
1010
- **Multilingual:** True
1111

1212
Specifications

doc/source/models/builtin/image/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@ The following is a list of built-in image models in Xinference:
3131

3232
qwen-image-edit
3333

34+
qwen-image-edit-2509
35+
3436
sd-turbo
3537

3638
sd3-medium
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
.. _models_builtin_qwen-image-edit-2509:
2+
3+
====================
4+
Qwen-Image-Edit-2509
5+
====================
6+
7+
- **Model Name:** Qwen-Image-Edit-2509
8+
- **Model Family:** stable_diffusion
9+
- **Abilities:** image2image
10+
- **Available ControlNet:** None
11+
12+
Specifications
13+
^^^^^^^^^^^^^^
14+
15+
- **Model ID:** Qwen/Qwen-Image-Edit-2509
16+
- **GGUF Model ID**: QuantStack/Qwen-Image-Edit-2509-GGUF
17+
- **GGUF Quantizations**: Q2_K, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, Q5_K_S, Q6_K, Q8_0
18+
19+
20+
Execute the following command to launch the model::
21+
22+
xinference launch --model-name Qwen-Image-Edit-2509 --model-type image
23+
24+
25+
For GGUF quantization, using below command::
26+
27+
xinference launch --model-name Qwen-Image-Edit-2509 --model-type image --gguf_quantization ${gguf_quantization} --cpu_offload True
28+
29+
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
.. _models_llm_baichuan-m2:
2+
3+
========================================
4+
Baichuan-M2
5+
========================================
6+
7+
- **Context Length:** 131072
8+
- **Model Name:** Baichuan-M2
9+
- **Languages:** en, zh
10+
- **Abilities:** chat, reasoning, hybrid, tools
11+
- **Description:** Baichuan-M2-32B is Baichuan AI's medical-enhanced reasoning model, the second medical model released by Baichuan. Designed for real-world medical reasoning tasks, this model builds upon Qwen2.5-32B with an innovative Large Verifier System. Through domain-specific fine-tuning on real-world medical questions, it achieves breakthrough medical performance while maintaining strong general capabilities.
12+
13+
Specifications
14+
^^^^^^^^^^^^^^
15+
16+
17+
Model Spec 1 (pytorch, 32 Billion)
18+
++++++++++++++++++++++++++++++++++++++++
19+
20+
- **Model Format:** pytorch
21+
- **Model Size (in billions):** 32
22+
- **Quantizations:** none
23+
- **Engines**: vLLM, Transformers
24+
- **Model ID:** baichuan-inc/Baichuan-M2-32B
25+
- **Model Hubs**: `Hugging Face <https://huggingface.co/baichuan-inc/Baichuan-M2-32B>`__, `ModelScope <https://modelscope.cn/models/baichuan-inc/Baichuan-M2-32B>`__
26+
27+
Execute the following command to launch the model, remember to replace ``${quantization}`` with your
28+
chosen quantization method from the options listed above::
29+
30+
xinference launch --model-engine ${engine} --model-name Baichuan-M2 --size-in-billions 32 --model-format pytorch --quantization ${quantization}
31+
32+
33+
Model Spec 2 (gptq, 32 Billion)
34+
++++++++++++++++++++++++++++++++++++++++
35+
36+
- **Model Format:** gptq
37+
- **Model Size (in billions):** 32
38+
- **Quantizations:** Int4
39+
- **Engines**: vLLM, Transformers
40+
- **Model ID:** baichuan-inc/Baichuan-M2-32B-GPTQ-Int4
41+
- **Model Hubs**: `Hugging Face <https://huggingface.co/baichuan-inc/Baichuan-M2-32B-GPTQ-Int4>`__, `ModelScope <https://modelscope.cn/models/baichuan-inc/Baichuan-M2-32B-GPTQ-Int4>`__
42+
43+
Execute the following command to launch the model, remember to replace ``${quantization}`` with your
44+
chosen quantization method from the options listed above::
45+
46+
xinference launch --model-engine ${engine} --model-name Baichuan-M2 --size-in-billions 32 --model-format gptq --quantization ${quantization}
47+

doc/source/models/builtin/llm/index.rst

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,11 @@ The following is a list of built-in LLM in Xinference:
2626
- 4096
2727
- Baichuan2-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting.
2828

29+
* - :ref:`baichuan-m2 <models_llm_baichuan-m2>`
30+
- chat, reasoning, hybrid, tools
31+
- 131072
32+
- Baichuan-M2-32B is Baichuan AI's medical-enhanced reasoning model, the second medical model released by Baichuan. Designed for real-world medical reasoning tasks, this model builds upon Qwen2.5-32B with an innovative Large Verifier System. Through domain-specific fine-tuning on real-world medical questions, it achieves breakthrough medical performance while maintaining strong general capabilities.
33+
2934
* - :ref:`code-llama <models_llm_code-llama>`
3035
- generate
3136
- 100000
@@ -346,6 +351,11 @@ The following is a list of built-in LLM in Xinference:
346351
- 32768
347352
- MiniCPM-V 2.6 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters.
348353

354+
* - :ref:`minicpm-v-4.5 <models_llm_minicpm-v-4.5>`
355+
- chat, vision
356+
- 32768
357+
- MiniCPM-V 4.5 is an improved version in the MiniCPM-V series with enhanced multimodal capabilities and better performance.
358+
349359
* - :ref:`minicpm3-4b <models_llm_minicpm3-4b>`
350360
- chat
351361
- 32768
@@ -536,11 +546,41 @@ The following is a list of built-in LLM in Xinference:
536546
- 262144
537547
- We introduce the updated version of the Qwen3-235B-A22B non-thinking mode, named Qwen3-235B-A22B-Instruct-2507
538548

549+
* - :ref:`qwen3-next-instruct <models_llm_qwen3-next-instruct>`
550+
- chat, tools
551+
- 262144
552+
- Qwen3-Next-80B-A3B is the first installment in the Qwen3-Next series
553+
554+
* - :ref:`qwen3-next-thinking <models_llm_qwen3-next-thinking>`
555+
- chat, reasoning, tools
556+
- 262144
557+
- Qwen3-Next-80B-A3B is the first installment in the Qwen3-Next series
558+
559+
* - :ref:`qwen3-omni-instruct <models_llm_qwen3-omni-instruct>`
560+
- chat, vision, audio, omni, tools
561+
- 262144
562+
- Qwen3-Omni is the natively end-to-end multilingual omni-modal foundation models. It processes text, images, audio, and video, and delivers real-time streaming responses in both text and natural speech. We introduce several architectural upgrades to improve performance and efficiency.
563+
564+
* - :ref:`qwen3-omni-thinking <models_llm_qwen3-omni-thinking>`
565+
- chat, vision, audio, omni, reasoning, tools
566+
- 262144
567+
- Qwen3-Omni is the natively end-to-end multilingual omni-modal foundation models. It processes text, images, audio, and video, and delivers real-time streaming responses in both text and natural speech. We introduce several architectural upgrades to improve performance and efficiency.
568+
539569
* - :ref:`qwen3-thinking <models_llm_qwen3-thinking>`
540570
- chat, reasoning, tools
541571
- 262144
542572
- we have continued to scale the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning
543573

574+
* - :ref:`qwen3-vl-instruct <models_llm_qwen3-vl-instruct>`
575+
- chat, vision, tools
576+
- 262144
577+
- Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.
578+
579+
* - :ref:`qwen3-vl-thinking <models_llm_qwen3-vl-thinking>`
580+
- chat, vision, reasoning, tools
581+
- 262144
582+
- Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.
583+
544584
* - :ref:`qwenlong-l1 <models_llm_qwenlong-l1>`
545585
- chat
546586
- 32768
@@ -670,6 +710,8 @@ The following is a list of built-in LLM in Xinference:
670710

671711
baichuan-2-chat
672712

713+
baichuan-m2
714+
673715
code-llama
674716

675717
code-llama-instruct
@@ -798,6 +840,8 @@ The following is a list of built-in LLM in Xinference:
798840

799841
minicpm-v-2.6
800842

843+
minicpm-v-4.5
844+
801845
minicpm3-4b
802846

803847
minicpm4
@@ -874,8 +918,20 @@ The following is a list of built-in LLM in Xinference:
874918

875919
qwen3-instruct
876920

921+
qwen3-next-instruct
922+
923+
qwen3-next-thinking
924+
925+
qwen3-omni-instruct
926+
927+
qwen3-omni-thinking
928+
877929
qwen3-thinking
878930

931+
qwen3-vl-instruct
932+
933+
qwen3-vl-thinking
934+
879935
qwenlong-l1
880936

881937
qwq-32b
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
.. _models_llm_minicpm-v-4.5:
2+
3+
========================================
4+
MiniCPM-V-4.5
5+
========================================
6+
7+
- **Context Length:** 32768
8+
- **Model Name:** MiniCPM-V-4.5
9+
- **Languages:** en, zh
10+
- **Abilities:** chat, vision
11+
- **Description:** MiniCPM-V 4.5 is an improved version in the MiniCPM-V series with enhanced multimodal capabilities and better performance.
12+
13+
Specifications
14+
^^^^^^^^^^^^^^
15+
16+
17+
Model Spec 1 (pytorch, 8 Billion)
18+
++++++++++++++++++++++++++++++++++++++++
19+
20+
- **Model Format:** pytorch
21+
- **Model Size (in billions):** 8
22+
- **Quantizations:** none
23+
- **Engines**: Transformers
24+
- **Model ID:** openbmb/MiniCPM-V-4_5
25+
- **Model Hubs**: `Hugging Face <https://huggingface.co/openbmb/MiniCPM-V-4_5>`__, `ModelScope <https://modelscope.cn/models/OpenBMB/MiniCPM-V-4_5>`__
26+
27+
Execute the following command to launch the model, remember to replace ``${quantization}`` with your
28+
chosen quantization method from the options listed above::
29+
30+
xinference launch --model-engine ${engine} --model-name MiniCPM-V-4.5 --size-in-billions 8 --model-format pytorch --quantization ${quantization}
31+
32+
33+
Model Spec 2 (pytorch, 8 Billion)
34+
++++++++++++++++++++++++++++++++++++++++
35+
36+
- **Model Format:** pytorch
37+
- **Model Size (in billions):** 8
38+
- **Quantizations:** none
39+
- **Engines**: Transformers
40+
- **Model ID:** openbmb/MiniCPM-V-4_5-int4
41+
- **Model Hubs**: `Hugging Face <https://huggingface.co/openbmb/MiniCPM-V-4_5-int4>`__, `ModelScope <https://modelscope.cn/models/OpenBMB/MiniCPM-V-4_5-int4>`__
42+
43+
Execute the following command to launch the model, remember to replace ``${quantization}`` with your
44+
chosen quantization method from the options listed above::
45+
46+
xinference launch --model-engine ${engine} --model-name MiniCPM-V-4.5 --size-in-billions 8 --model-format pytorch --quantization ${quantization}
47+

doc/source/models/builtin/llm/qwen2-audio-instruct.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Model Spec 1 (pytorch, 7 Billion)
2020
- **Model Format:** pytorch
2121
- **Model Size (in billions):** 7
2222
- **Quantizations:** none
23-
- **Engines**: Transformers
23+
- **Engines**: vLLM, Transformers
2424
- **Model ID:** Qwen/Qwen2-Audio-7B-Instruct
2525
- **Model Hubs**: `Hugging Face <https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct>`__, `ModelScope <https://modelscope.cn/models/qwen/Qwen2-Audio-7B-Instruct>`__
2626

0 commit comments

Comments
 (0)