-
Notifications
You must be signed in to change notification settings - Fork 792
Closed
Description
System Info / 系統信息
Ubuntu 20.04
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- docker / docker
- pip install / 通过 pip install 安装
- installation from source / 从源码安装
Version info / 版本信息
xinf 1.16.0
The command used to start Xinference / 用以启动 xinference 的命令
xinference-local
Reproduction / 复现过程
当用户用webui 启动模型,
希望开启特殊模式,比如rope scaling 上下文填充,或者ds mtp 的时候, 对应字段填进去会报错,无法启动。
Expected behavior / 期待表现
当用户填入rope_scaling 或者mtp 模式参数的时候,可以满足
目前的话,只能用sdk 启动 的方式,把特殊字段传进additional parameters 里面,比如这样。对于MTP,
echo '{
"model_path": "/weights/'$${model_directory}'",
"additional_params": {
"tensor_parallel_size": 32,
"max_model_len": 65536,
"speculative_config": {
"method": "deepseek_mtp",
"num_speculative_tokens": 1
},
"enforce_eager": true
}
}' > register_ds3.json
model_uid = client.launch_model(
model_name="DeepSeek-V3.1",
model_uid="DeepSeek-V3.1",
model_engine="vllm",
model_format="pytorch",
model_size_in_billions=671,
model_path=model_path,
n_gpu=args.n_gpu,
replica=args.instance_nums,
gpu_idx=gpu_list,
enable_thinking=True,
reasoning_content=True,
**model_dict.get("additional_params", {})
)
对于rope scaling:
echo '{
"model_path": "/weights/'$${model_directory}'",
"additional_params": {
"tensor_parallel_size": 4,
"max_model_len": 131072,
"rope_scaling": {
"rope_type": "yarn",
"factor": 4.0,
"original_max_position_embeddings": 32768
},
"enforce_eager": true
}
}' > register_qwen3.json