用webui 方式启动vllm 模型，vllm 不支持传入rope scaling 和 mtp 模式参数

### System Info / 系統信息

Ubuntu 20.04


### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

- [x] docker / docker
- [ ] pip install / 通过 pip install 安装
- [ ] installation from source / 从源码安装

### Version info / 版本信息

xinf 1.16.0

### The command used to start Xinference / 用以启动 xinference 的命令

xinference-local 

### Reproduction / 复现过程

<img width="1118" height="299" alt="Image" src="https://github.com/user-attachments/assets/a9838022-3d29-4379-96ef-2d6cfd3a9190" />
当用户用webui 启动模型，
希望开启特殊模式，比如rope scaling 上下文填充，或者ds mtp 的时候, 对应字段填进去会报错，无法启动。


### Expected behavior / 期待表现

当用户填入rope_scaling 或者mtp 模式参数的时候，可以满足
目前的话，只能用sdk 启动 的方式，把特殊字段传进additional parameters 里面，比如这样。对于MTP，
        echo '{
          "model_path": "/weights/'$${model_directory}'",
          "additional_params": {
            "tensor_parallel_size": 32,
            "max_model_len": 65536,
            "speculative_config": {
              "method": "deepseek_mtp",
              "num_speculative_tokens": 1
            },
            "enforce_eager": true
          }
        }' > register_ds3.json

        model_uid = client.launch_model(
            model_name="DeepSeek-V3.1",
            model_uid="DeepSeek-V3.1",
            model_engine="vllm",
            model_format="pytorch",
            model_size_in_billions=671,
            model_path=model_path,
            n_gpu=args.n_gpu,
            replica=args.instance_nums,
            gpu_idx=gpu_list,
            enable_thinking=True,
            reasoning_content=True,
            **model_dict.get("additional_params", {})
        )

对于rope scaling:
 echo '{
          "model_path": "/weights/'$${model_directory}'",
          "additional_params": {
            "tensor_parallel_size": 4,
            "max_model_len": 131072,
            "rope_scaling": {
              "rope_type": "yarn",
              "factor": 4.0,
              "original_max_position_embeddings": 32768
            },
            "enforce_eager": true
          }
        }' > register_qwen3.json


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

用webui 方式启动vllm 模型，vllm 不支持传入rope scaling 和 mtp 模式参数 #4453

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

用webui 方式启动vllm 模型，vllm 不支持传入rope scaling 和 mtp 模式参数 #4453

Description

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions