With vLLM TP > 2 and nvidia.com/gpucores set, vLLM is unable to run.
i.e
export CUDA_DEVICE_SM_LIMIT=40
vllm serve vllm serve /home/chauncey/qwen3-8b -tp 2 --enforce-eager
vllm bench serve --model /home/chauncey/qwen3-8b --endpoint /v1/completions --dataset-name random --random-input 5 --random-output 5 --num-prompts 1000
It can be observed that vLLM is unable to run. But after unsetting CUDA_DEVICE_SM_LIMIT, vLLM works normally.
https://github.com/Project-HAMi/HAMi-core/blob/main/src/multiprocess/multiprocess_utilization_watcher.c#L205