xorbitsai
diff --git a/‎README.md‎
Lines changed: 3 additions & 3 deletions b/‎README.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎README_zh_CN.md‎
Lines changed: 3 additions & 3 deletions b/‎README_zh_CN.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎doc/source/getting_started/installation.rst‎
Lines changed: 3 additions & 2 deletions b/‎doc/source/getting_started/installation.rst‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎doc/source/models/builtin/audio/indextts2.rst‎
Lines changed: 1 addition & 1 deletion b/‎doc/source/models/builtin/audio/indextts2.rst‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/source/models/builtin/image/index.rst‎
Lines changed: 2 additions & 0 deletions b/‎doc/source/models/builtin/image/index.rst‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎doc/source/models/builtin/image/qwen-image-edit-2509.rst‎
Lines changed: 29 additions & 0 deletions b/‎doc/source/models/builtin/image/qwen-image-edit-2509.rst‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎doc/source/models/builtin/llm/baichuan-m2.rst‎
Lines changed: 47 additions & 0 deletions b/‎doc/source/models/builtin/llm/baichuan-m2.rst‎
Lines changed: 47 additions & 0 deletions
diff --git a/‎doc/source/models/builtin/llm/index.rst‎
Lines changed: 56 additions & 0 deletions b/‎doc/source/models/builtin/llm/index.rst‎
Lines changed: 56 additions & 0 deletions
diff --git a/‎doc/source/models/builtin/llm/minicpm-v-4.5.rst‎
Lines changed: 47 additions & 0 deletions b/‎doc/source/models/builtin/llm/minicpm-v-4.5.rst‎
Lines changed: 47 additions & 0 deletions
diff --git a/‎doc/source/models/builtin/llm/qwen2-audio-instruct.rst‎
Lines changed: 1 addition & 1 deletion b/‎doc/source/models/builtin/llm/qwen2-audio-instruct.rst‎
Lines changed: 1 addition & 1 deletion
@@ -46,14 +46,14 @@ potential of cutting-edge AI models.
 - Support SGLang backend: [#1161](https://github.com/xorbitsai/inference/pull/1161)
 - Support LoRA for LLM and image models: [#1080](https://github.com/xorbitsai/inference/pull/1080)
 ### New Models
+- Built-in support for [minicpm-v-4.5](https://github.com/OpenBMB/MiniCPM-V): [#4136](https://github.com/xorbitsai/inference/pull/4136)
+- Built-in support for [Qwen3-VL](https://qwen.ai/blog?id=99f0335c4ad9ff6153e517418d48535ab6d8afef&from=research.latest-advancements-list): [#4112](https://github.com/xorbitsai/inference/pull/4112)
+- Built-in support for [Qwen3-Next](https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancements-list): [#4113](https://github.com/xorbitsai/inference/pull/4113)
 - Built-in support for [Deepseek-V3.1](https://api-docs.deepseek.com/news/news250821): [#4022](https://github.com/xorbitsai/inference/pull/4022)
 - Built-in support for [Qwen-Image-Edit](https://huggingface.co/Qwen/Qwen-Image-Edit): [#3989](https://github.com/xorbitsai/inference/pull/3989)
 - Built-in support for [Wan2.2](https://github.com/Wan-Video/Wan2.2): [#3996](https://github.com/xorbitsai/inference/pull/3996)
 - Built-in support for [seed-oss](https://github.com/ByteDance-Seed/seed-oss): [#4020](https://github.com/xorbitsai/inference/pull/4020)
 - Built-in support for [gpt-oss](https://openai.com/zh-Hans-CN/index/introducing-gpt-oss/): [#3924](https://github.com/xorbitsai/inference/pull/3924)
-- Built-in support for [GLM-4.5v](https://github.com/zai-org/GLM-V): [#3957](https://github.com/xorbitsai/inference/pull/3957)
-- Built-in support for [Qwen-Image](https://qwenlm.github.io/blog/qwen-image/): [#3916](https://github.com/xorbitsai/inference/pull/3916)
-- Built-in support for [GLM-4.5](https://github.com/zai-org/GLM-4.5): [#3882](https://github.com/xorbitsai/inference/pull/3882)
 ### Integrations
 - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
 - [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization.
 
@@ -43,14 +43,14 @@ Xorbits Inference（Xinference）是一个性能强大且功能全面的分布
 - 支持 SGLang 后端: [#1161](https://github.com/xorbitsai/inference/pull/1161)
 - 支持LLM和图像模型的LoRA: [#1080](https://github.com/xorbitsai/inference/pull/1080)
 ### 新模型
+- 内置 [minicpm-v-4.5](https://github.com/OpenBMB/MiniCPM-V): [#4136](https://github.com/xorbitsai/inference/pull/4136)
+- 内置 [Qwen3-VL](https://qwen.ai/blog?id=99f0335c4ad9ff6153e517418d48535ab6d8afef&from=research.latest-advancements-list): [#4112](https://github.com/xorbitsai/inference/pull/4112)
+- 内置 [Qwen3-Next](https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancements-list): [#4113](https://github.com/xorbitsai/inference/pull/4113)
 - 内置 [Deepseek-V3.1](https://api-docs.deepseek.com/news/news250821): [#4022](https://github.com/xorbitsai/inference/pull/4022)
 - 内置 [Qwen-Image-Edit](https://huggingface.co/Qwen/Qwen-Image-Edit): [#3989](https://github.com/xorbitsai/inference/pull/3989)
 - 内置 [Wan2.2](https://github.com/Wan-Video/Wan2.2): [#3996](https://github.com/xorbitsai/inference/pull/3996)
 - 内置 [seed-oss](https://github.com/ByteDance-Seed/seed-oss): [#4020](https://github.com/xorbitsai/inference/pull/4020)
 - 内置 [gpt-oss](https://openai.com/zh-Hans-CN/index/introducing-gpt-oss/): [#3924](https://github.com/xorbitsai/inference/pull/3924)
-- 内置 [GLM-4.5v](https://github.com/zai-org/GLM-V): [#3957](https://github.com/xorbitsai/inference/pull/3957)
-- 内置 [Qwen-Image](https://qwenlm.github.io/zh/blog/qwen-image/): [#3916](https://github.com/xorbitsai/inference/pull/3916)
-- 内置 [GLM-4.5](https://github.com/zai-org/GLM-4.5): [#3882](https://github.com/xorbitsai/inference/pull/3882)
 ### 集成
 - [FastGPT](https://doc.fastai.site/docs/development/custom-models/xinference/)：一个基于 LLM 大模型的开源 AI 知识库构建平台。提供了开箱即用的数据处理、模型调用、RAG 检索、可视化 AI 工作流编排等能力，帮助您轻松实现复杂的问答场景。
 - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): 一个涵盖了大型语言模型开发、部署、维护和优化的 LLMOps 平台。
 
@@ -94,14 +94,15 @@ Currently, supported models include:
 - ``moonlight-16b-a3b-instruct``
 - ``qwenLong-l1``
 - ``qwen3``
+- ``Baichuan-M2``
 - ``minicpm4``
 - ``Ernie4.5``
-- ``Qwen3-Instruct``, ``Qwen3-Thinking``, ``Qwen3-Coder``
+- ``Qwen3-Instruct``, ``Qwen3-Thinking``, ``Qwen3-Coder``, ``Qwen3-Next-Instruct``, ``Qwen3-Next-Thinking``
 - ``Deepseek-V3.1``
 - ``glm-4.5``
 - ``KAT-V1``
 - ``gpt-oss``
-- ``seed-oss``, ``seed-oss``
+- ``seed-oss``
 .. vllm_end
 
 To install Xinference and vLLM::
 
@@ -6,7 +6,7 @@ IndexTTS2
 
 - **Model Name:** IndexTTS2
 - **Model Family:** IndexTTS2
-- **Abilities:** ['text2audio', 'text2audio_voice_cloning', 'text2audio_emotion_control']
+- **Abilities:** ['text2audio', 'text2audio_zero_shot', 'text2audio_voice_cloning', 'text2audio_emotion_control']
 - **Multilingual:** True
 
 Specifications
 
@@ -31,6 +31,8 @@ The following is a list of built-in image models in Xinference:
 
    qwen-image-edit
 
+   qwen-image-edit-2509
+  
    sd-turbo
 
    sd3-medium
 
@@ -0,0 +1,29 @@
+.. _models_builtin_qwen-image-edit-2509:
+
+====================
+Qwen-Image-Edit-2509
+====================
+
+- **Model Name:** Qwen-Image-Edit-2509
+- **Model Family:** stable_diffusion
+- **Abilities:** image2image
+- **Available ControlNet:** None
+
+Specifications
+^^^^^^^^^^^^^^
+
+- **Model ID:** Qwen/Qwen-Image-Edit-2509
+- **GGUF Model ID**: QuantStack/Qwen-Image-Edit-2509-GGUF
+- **GGUF Quantizations**: Q2_K, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, Q5_K_S, Q6_K, Q8_0
+
+
+Execute the following command to launch the model::
+
+   xinference launch --model-name Qwen-Image-Edit-2509 --model-type image
+
+
+For GGUF quantization, using below command::
+
+    xinference launch --model-name Qwen-Image-Edit-2509 --model-type image --gguf_quantization ${gguf_quantization} --cpu_offload True
+
+
@@ -0,0 +1,47 @@
+.. _models_llm_baichuan-m2:
+
+========================================
+Baichuan-M2
+========================================
+
+- **Context Length:** 131072
+- **Model Name:** Baichuan-M2
+- **Languages:** en, zh
+- **Abilities:** chat, reasoning, hybrid, tools
+- **Description:** Baichuan-M2-32B is Baichuan AI's medical-enhanced reasoning model, the second medical model released by Baichuan. Designed for real-world medical reasoning tasks, this model builds upon Qwen2.5-32B with an innovative Large Verifier System. Through domain-specific fine-tuning on real-world medical questions, it achieves breakthrough medical performance while maintaining strong general capabilities.
+
+Specifications
+^^^^^^^^^^^^^^
+
+
+Model Spec 1 (pytorch, 32 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 32
+- **Quantizations:** none
+- **Engines**: vLLM, Transformers
+- **Model ID:** baichuan-inc/Baichuan-M2-32B
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/baichuan-inc/Baichuan-M2-32B>`__, `ModelScope <https://modelscope.cn/models/baichuan-inc/Baichuan-M2-32B>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name Baichuan-M2 --size-in-billions 32 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 2 (gptq, 32 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** gptq
+- **Model Size (in billions):** 32
+- **Quantizations:** Int4
+- **Engines**: vLLM, Transformers
+- **Model ID:** baichuan-inc/Baichuan-M2-32B-GPTQ-Int4
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/baichuan-inc/Baichuan-M2-32B-GPTQ-Int4>`__, `ModelScope <https://modelscope.cn/models/baichuan-inc/Baichuan-M2-32B-GPTQ-Int4>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name Baichuan-M2 --size-in-billions 32 --model-format gptq --quantization ${quantization}
+
@@ -26,6 +26,11 @@ The following is a list of built-in LLM in Xinference:
      - 4096
      - Baichuan2-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting.
 
+   * - :ref:`baichuan-m2 <models_llm_baichuan-m2>`
+     - chat, reasoning, hybrid, tools
+     - 131072
+     - Baichuan-M2-32B is Baichuan AI's medical-enhanced reasoning model, the second medical model released by Baichuan. Designed for real-world medical reasoning tasks, this model builds upon Qwen2.5-32B with an innovative Large Verifier System. Through domain-specific fine-tuning on real-world medical questions, it achieves breakthrough medical performance while maintaining strong general capabilities.
+
    * - :ref:`code-llama <models_llm_code-llama>`
      - generate
      - 100000
@@ -346,6 +351,11 @@ The following is a list of built-in LLM in Xinference:
      - 32768
      - MiniCPM-V 2.6 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters.
 
+   * - :ref:`minicpm-v-4.5 <models_llm_minicpm-v-4.5>`
+     - chat, vision
+     - 32768
+     - MiniCPM-V 4.5 is an improved version in the MiniCPM-V series with enhanced multimodal capabilities and better performance.
+
    * - :ref:`minicpm3-4b <models_llm_minicpm3-4b>`
      - chat
      - 32768
@@ -536,11 +546,41 @@ The following is a list of built-in LLM in Xinference:
      - 262144
      - We introduce the updated version of the Qwen3-235B-A22B non-thinking mode, named Qwen3-235B-A22B-Instruct-2507
 
+   * - :ref:`qwen3-next-instruct <models_llm_qwen3-next-instruct>`
+     - chat, tools
+     - 262144
+     - Qwen3-Next-80B-A3B is the first installment in the Qwen3-Next series
+
+   * - :ref:`qwen3-next-thinking <models_llm_qwen3-next-thinking>`
+     - chat, reasoning, tools
+     - 262144
+     - Qwen3-Next-80B-A3B is the first installment in the Qwen3-Next series
+
+   * - :ref:`qwen3-omni-instruct <models_llm_qwen3-omni-instruct>`
+     - chat, vision, audio, omni, tools
+     - 262144
+     - Qwen3-Omni is the natively end-to-end multilingual omni-modal foundation models. It processes text, images, audio, and video, and delivers real-time streaming responses in both text and natural speech. We introduce several architectural upgrades to improve performance and efficiency.
+
+   * - :ref:`qwen3-omni-thinking <models_llm_qwen3-omni-thinking>`
+     - chat, vision, audio, omni, reasoning, tools
+     - 262144
+     - Qwen3-Omni is the natively end-to-end multilingual omni-modal foundation models. It processes text, images, audio, and video, and delivers real-time streaming responses in both text and natural speech. We introduce several architectural upgrades to improve performance and efficiency.
+
    * - :ref:`qwen3-thinking <models_llm_qwen3-thinking>`
      - chat, reasoning, tools
      - 262144
      - we have continued to scale the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning
 
+   * - :ref:`qwen3-vl-instruct <models_llm_qwen3-vl-instruct>`
+     - chat, vision, tools
+     - 262144
+     - Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.
+
+   * - :ref:`qwen3-vl-thinking <models_llm_qwen3-vl-thinking>`
+     - chat, vision, reasoning, tools
+     - 262144
+     - Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.
+
    * - :ref:`qwenlong-l1 <models_llm_qwenlong-l1>`
      - chat
      - 32768
@@ -670,6 +710,8 @@ The following is a list of built-in LLM in Xinference:
 
    baichuan-2-chat
 
+   baichuan-m2
+  
    code-llama
 
    code-llama-instruct
@@ -798,6 +840,8 @@ The following is a list of built-in LLM in Xinference:
 
    minicpm-v-2.6
 
+   minicpm-v-4.5
+  
    minicpm3-4b
 
    minicpm4
@@ -874,8 +918,20 @@ The following is a list of built-in LLM in Xinference:
 
    qwen3-instruct
 
+   qwen3-next-instruct
+  
+   qwen3-next-thinking
+  
+   qwen3-omni-instruct
+  
+   qwen3-omni-thinking
+  
    qwen3-thinking
 
+   qwen3-vl-instruct
+  
+   qwen3-vl-thinking
+  
    qwenlong-l1
 
    qwq-32b
 
@@ -0,0 +1,47 @@
+.. _models_llm_minicpm-v-4.5:
+
+========================================
+MiniCPM-V-4.5
+========================================
+
+- **Context Length:** 32768
+- **Model Name:** MiniCPM-V-4.5
+- **Languages:** en, zh
+- **Abilities:** chat, vision
+- **Description:** MiniCPM-V 4.5 is an improved version in the MiniCPM-V series with enhanced multimodal capabilities and better performance.
+
+Specifications
+^^^^^^^^^^^^^^
+
+
+Model Spec 1 (pytorch, 8 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 8
+- **Quantizations:** none
+- **Engines**: Transformers
+- **Model ID:** openbmb/MiniCPM-V-4_5
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/openbmb/MiniCPM-V-4_5>`__, `ModelScope <https://modelscope.cn/models/OpenBMB/MiniCPM-V-4_5>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name MiniCPM-V-4.5 --size-in-billions 8 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 2 (pytorch, 8 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 8
+- **Quantizations:** none
+- **Engines**: Transformers
+- **Model ID:** openbmb/MiniCPM-V-4_5-int4
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/openbmb/MiniCPM-V-4_5-int4>`__, `ModelScope <https://modelscope.cn/models/OpenBMB/MiniCPM-V-4_5-int4>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name MiniCPM-V-4.5 --size-in-billions 8 --model-format pytorch --quantization ${quantization}
+
@@ -20,7 +20,7 @@ Model Spec 1 (pytorch, 7 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 7
 - **Quantizations:** none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers
 - **Model ID:** Qwen/Qwen2-Audio-7B-Instruct
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct>`__, `ModelScope <https://modelscope.cn/models/qwen/Qwen2-Audio-7B-Instruct>`__