-
Notifications
You must be signed in to change notification settings - Fork 145
Open
Description
System Info / 系統信息
910B4 8*64G,CANN:8.2.RC1 xLLM:0.8.0
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
- The official example scripts / 官方的示例脚本
- My own modified scripts / 我自己修改的脚本和任务
Reproduction / 复现过程
I deployed GLM-4.6V using xLLM0.8.0.:
my env_set script as f follow:
export PYTHON_INCLUDE_PATH="$(python3 -c 'from sysconfig import get_paths; print(get_paths()["include"])')"
export PYTHON_LIB_PATH="$(python3 -c 'from sysconfig import get_paths; print(get_paths()["stdlib"])')"
export PYTORCH_NPU_INSTALL_PATH=/usr/local/libtorch_npu/
export PYTORCH_INSTALL_PATH="$(python3 -c 'import torch, os; print(os.path.dirname(os.path.abspath(torch.__file__)))')"
export LIBTORCH_ROOT="$(python3 -c 'import torch, os; print(os.path.dirname(os.path.abspath(torch.__file__)))')"
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/xllm/op_api/lib/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/libtorch_npu/lib:$LD_LIBRARY_PATH
export LD_PRELOAD=/usr/lib64/libtcmalloc.so.4:/usr/lib64/libjemalloc.so.2:$LD_PRELOAD
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh
rm -rf /root/atb/log/
rm -rf /root/ascend/log/
rm -rf core.*
export ASDOPS_LOG_LEVEL=ERROR
export ASDOPS_LOG_TO_STDOUT=1
export ASDOPS_LOG_TO_FILE=1
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export NPU_MEMORY_FRACTION=0.96
export ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=3
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export OMP_NUM_THREADS=12
export ALLOW_INTERNAL_FORMAT=1
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export ATB_LLM_ENABLE_AUTO_TRANSPOSE=0
export ATB_CONVERT_NCHW_TO_ND=1
export ATB_LAUNCH_KERNEL_WITH_TILING=1
export ATB_OPERATION_EXECUTE_ASYNC=2
export ATB_CONTEXT_WORKSPACE_SIZE=0
export INF_NAN_MODE_ENABLE=1
export HCCL_EXEC_TIMEOUT=0
export HCCL_CONNECT_TIMEOUT=7200
export HCCL_OP_EXPANSION_MODE="AIV"
export HCCL_IF_BASE_PORT=2864
my xllm_run script as follow:
BATCH_SIZE=256
XLLM_PATH="/export/home/xllm/build/xllm/core/server/xllm"
MODEL_PATH=/model/ZhipuAI/GLM-4.6v
MASTER_NODE_ADDR="0.0.0.0:10015"
LOCAL_HOST="0.0.0.0"
# Service Port
START_PORT=18994
START_DEVICE=0
LOG_DIR="logs"
mdkir -p $LOG_DIR
NNODES=4
for (( i=0; i<$NNODES; i++ ))
do
PORT=$((START_PORT + i))
DEVICE=$((START_DEVICE + i))
LOG_FILE="$LOG_DIR/node_$i.log"
nohup numactl -C $((DEVICE*12))-$((DEVICE*12+11)) $XLLM_PATH \
--model $MODEL_PATH --model_id glm_46v \
--host $LOCAL_HOST \
--port $PORT \
--devices="npu:$DEVICE" \
--master_node_addr=$MASTER_NODE_ADDR \
--nnodes=$NNODES \
--node_rank=$i \
--max_memory_utilization=0.86 \
--max_tokens_per_batch=40000 \
--max_seqs_per_batch=$BATCH_SIZE \
--communication_backend=hccl \
--enable_schedule_overlap=true \
--enable_prefix_cache=true \
--enable_chunked_prefill=false \
--enable_shm=true \
> $LOG_FILE 2>&1 &
done
my client script as follow:
import base64
import requests
api_url = "http://localhost:18994/v1/chat/completions"
image_url = "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"
def encode_image(url: str) -> str:
with requests.get(url) as response:
response.raise_for_status()
result = base64.b64encode(response.content).decode("utf-8")
return result
image_base64 = encode_image(image_url)
payload = {
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "介绍下这张图片"},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{image_base64}"},
},
],
}
],
"model": "glm_46v", # 与 xllm_run.sh 中的 --model_id 保持一致
"max_completion_tokens": 128,
}
response = requests.post(
api_url,
json=payload,
headers={"Content-Type": "application/json"}
)
print(response.json())
My condition:
The service can be started normally, and the test via the URL http://localhost:18994/v1/models also works. However, when I call the service, I encounter the "not support dtype" error, and the error message is as follows:
terminate called after throwing an instance of 'std::runtime_error'
what(): CreateAtTensorFromTensorDesc: not support dtype
terminate called recursively
F20260122 11:21:03.746834 407874 batch.cpp:303] Check failed: output_idx < num_seqs (0 vs. 0)
*** Check failure stack trace: ***
@ 0x67f5ac google::LogMessage::SendToLog()
@ 0x67bec0 google::LogMessage::Flush()
@ 0x67fb8c google::LogMessageFatal::~LogMessageFatal()
@ 0xbb5b94 xllm::Batch::process_sample_output()
@ 0xab9d14 xllm::VLMEngine::step()
@ 0xb03368 xllm::ContinuousScheduler::step()
@ 0x65627c _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN4xllm9VLMMaster3runEvEUlvE_EEEEE6_M_runEv
@ 0xffffa8d8cd24 execute_native_thread_routine
@ 0xffff9fd3ce4c (unknown)
@ 0xffff9fda3b0c (unknown)
@ (nil) (unknown)
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
/usr/lib64/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Expected behavior / 期待表现
Where is the problem? How should I solve it? I hope you can provide me with a detailed explanation.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels