You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,14 +46,14 @@ potential of cutting-edge AI models.
46
46
- Support SGLang backend: [#1161](https://github.com/xorbitsai/inference/pull/1161)
47
47
- Support LoRA for LLM and image models: [#1080](https://github.com/xorbitsai/inference/pull/1080)
48
48
### New Models
49
+
- Built-in support for [minicpm-v-4.5](https://github.com/OpenBMB/MiniCPM-V): [#4136](https://github.com/xorbitsai/inference/pull/4136)
50
+
- Built-in support for [Qwen3-VL](https://qwen.ai/blog?id=99f0335c4ad9ff6153e517418d48535ab6d8afef&from=research.latest-advancements-list): [#4112](https://github.com/xorbitsai/inference/pull/4112)
51
+
- Built-in support for [Qwen3-Next](https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancements-list): [#4113](https://github.com/xorbitsai/inference/pull/4113)
49
52
- Built-in support for [Deepseek-V3.1](https://api-docs.deepseek.com/news/news250821): [#4022](https://github.com/xorbitsai/inference/pull/4022)
50
53
- Built-in support for [Qwen-Image-Edit](https://huggingface.co/Qwen/Qwen-Image-Edit): [#3989](https://github.com/xorbitsai/inference/pull/3989)
51
54
- Built-in support for [Wan2.2](https://github.com/Wan-Video/Wan2.2): [#3996](https://github.com/xorbitsai/inference/pull/3996)
52
55
- Built-in support for [seed-oss](https://github.com/ByteDance-Seed/seed-oss): [#4020](https://github.com/xorbitsai/inference/pull/4020)
53
56
- Built-in support for [gpt-oss](https://openai.com/zh-Hans-CN/index/introducing-gpt-oss/): [#3924](https://github.com/xorbitsai/inference/pull/3924)
54
-
- Built-in support for [GLM-4.5v](https://github.com/zai-org/GLM-V): [#3957](https://github.com/xorbitsai/inference/pull/3957)
55
-
- Built-in support for [Qwen-Image](https://qwenlm.github.io/blog/qwen-image/): [#3916](https://github.com/xorbitsai/inference/pull/3916)
56
-
- Built-in support for [GLM-4.5](https://github.com/zai-org/GLM-4.5): [#3882](https://github.com/xorbitsai/inference/pull/3882)
57
57
### Integrations
58
58
-[Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
59
59
-[FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization.
-[FastGPT](https://doc.fastai.site/docs/development/custom-models/xinference/):一个基于 LLM 大模型的开源 AI 知识库构建平台。提供了开箱即用的数据处理、模型调用、RAG 检索、可视化 AI 工作流编排等能力,帮助您轻松实现复杂的问答场景。
- **Description:** Baichuan-M2-32B is Baichuan AI's medical-enhanced reasoning model, the second medical model released by Baichuan. Designed for real-world medical reasoning tasks, this model builds upon Qwen2.5-32B with an innovative Large Verifier System. Through domain-specific fine-tuning on real-world medical questions, it achieves breakthrough medical performance while maintaining strong general capabilities.
12
+
13
+
Specifications
14
+
^^^^^^^^^^^^^^
15
+
16
+
17
+
Model Spec 1 (pytorch, 32 Billion)
18
+
++++++++++++++++++++++++++++++++++++++++
19
+
20
+
- **Model Format:** pytorch
21
+
- **Model Size (in billions):** 32
22
+
- **Quantizations:** none
23
+
- **Engines**: vLLM, Transformers
24
+
- **Model ID:** baichuan-inc/Baichuan-M2-32B
25
+
- **Model Hubs**: `Hugging Face <https://huggingface.co/baichuan-inc/Baichuan-M2-32B>`__, `ModelScope <https://modelscope.cn/models/baichuan-inc/Baichuan-M2-32B>`__
26
+
27
+
Execute the following command to launch the model, remember to replace ``${quantization}`` with your
28
+
chosen quantization method from the options listed above::
Copy file name to clipboardExpand all lines: doc/source/models/builtin/llm/index.rst
+56Lines changed: 56 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,6 +26,11 @@ The following is a list of built-in LLM in Xinference:
26
26
- 4096
27
27
- Baichuan2-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting.
28
28
29
+
* - :ref:`baichuan-m2 <models_llm_baichuan-m2>`
30
+
- chat, reasoning, hybrid, tools
31
+
- 131072
32
+
- Baichuan-M2-32B is Baichuan AI's medical-enhanced reasoning model, the second medical model released by Baichuan. Designed for real-world medical reasoning tasks, this model builds upon Qwen2.5-32B with an innovative Large Verifier System. Through domain-specific fine-tuning on real-world medical questions, it achieves breakthrough medical performance while maintaining strong general capabilities.
33
+
29
34
* - :ref:`code-llama <models_llm_code-llama>`
30
35
- generate
31
36
- 100000
@@ -346,6 +351,11 @@ The following is a list of built-in LLM in Xinference:
346
351
- 32768
347
352
- MiniCPM-V 2.6 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters.
- Qwen3-Omni is the natively end-to-end multilingual omni-modal foundation models. It processes text, images, audio, and video, and delivers real-time streaming responses in both text and natural speech. We introduce several architectural upgrades to improve performance and efficiency.
- Qwen3-Omni is the natively end-to-end multilingual omni-modal foundation models. It processes text, images, audio, and video, and delivers real-time streaming responses in both text and natural speech. We introduce several architectural upgrades to improve performance and efficiency.
0 commit comments