Skip to content

我创建“优先知识库问答+搜索引擎兜底-b4d78”使用报错“工作流任务执行失败” #1788

@13516513760

Description

@13516513760

我配置的llm模型是使用vllm执行的qwen3-32b,这是我的启动代码:

export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
VLLM_USE_MODELSCOPE=true CUDA_VISIBLE_DEVICES=0,1 \
python -m vllm.entrypoints.openai.api_server \
    --host 0.0.0.0 \
    --port 8099 \
    --gpu-memory-utilization 0.5 \
    --max-model-len 10240 \
    --served-model-name Qwen3-32B \
    --model /home/pc/data/models/qwen3-32B-AWQ \
    --tensor-parallel-size 2 \
    --dtype auto \
    --enable-reasoning \
    --reasoning-parser deepseek_r1

然后我在bisheng中配置他的max_tokens是10480:
Image
然后我新构建一个““优先知识库问答+搜索引擎兜底-b4d78”,运行,但是他却一直报错。


运行异常
工作流任务执行失败:Error code: 400 - {'error': {'message': "'max_tokens' or 'max_completion_tokens' is too large: 10240. This model's maximum context length is 10240 tokens and your request has 8919 input tokens (10240 > 10240 - 8919). None", 'type': 'BadRequestError', 'param': None, 'code': 400}}

请问我该怎么做

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions