Skip to content

用webui 方式启动vllm 模型,vllm 不支持传入rope scaling 和 mtp 模式参数 #4453

@ZhikaiGuo960110

Description

@ZhikaiGuo960110

System Info / 系統信息

Ubuntu 20.04

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

xinf 1.16.0

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local

Reproduction / 复现过程

Image 当用户用webui 启动模型, 希望开启特殊模式,比如rope scaling 上下文填充,或者ds mtp 的时候, 对应字段填进去会报错,无法启动。

Expected behavior / 期待表现

当用户填入rope_scaling 或者mtp 模式参数的时候,可以满足
目前的话,只能用sdk 启动 的方式,把特殊字段传进additional parameters 里面,比如这样。对于MTP,
echo '{
"model_path": "/weights/'$${model_directory}'",
"additional_params": {
"tensor_parallel_size": 32,
"max_model_len": 65536,
"speculative_config": {
"method": "deepseek_mtp",
"num_speculative_tokens": 1
},
"enforce_eager": true
}
}' > register_ds3.json

    model_uid = client.launch_model(
        model_name="DeepSeek-V3.1",
        model_uid="DeepSeek-V3.1",
        model_engine="vllm",
        model_format="pytorch",
        model_size_in_billions=671,
        model_path=model_path,
        n_gpu=args.n_gpu,
        replica=args.instance_nums,
        gpu_idx=gpu_list,
        enable_thinking=True,
        reasoning_content=True,
        **model_dict.get("additional_params", {})
    )

对于rope scaling:
echo '{
"model_path": "/weights/'$${model_directory}'",
"additional_params": {
"tensor_parallel_size": 4,
"max_model_len": 131072,
"rope_scaling": {
"rope_type": "yarn",
"factor": 4.0,
"original_max_position_embeddings": 32768
},
"enforce_eager": true
}
}' > register_qwen3.json

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions