Skip to content

Conversation

liyonghua0910
Copy link
Collaborator

需求描述

Completion 接口需要支持在 prompt 字段直传 token ids 作为模型输入,与 vLLM 对齐。同时,在 FD v2.0.4 中新增的 prompt_token_ids 仍然有效,优先级暂定为 prompt_token_ids > prompt。另外,原来版本 prompt_token_ids 仅支持单条请求推理,现新增对批量推理的支持。

单条推理:

curl -X POST "http://0.0.0.0:8185/v1/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "prompt": [123, 456, 789],
        "max_tokens": 10
    }'

curl -X POST "http://0.0.0.0:8185/v1/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "prompt": "",
        "prompt_token_ids": [123, 456, 789],
        "max_tokens": 10
    }'

批量推理:

curl -X POST "http://0.0.0.0:8185/v1/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "prompt": [[123, 456, 789], [987, 654, 321]],
        "max_tokens": 10
    }'

curl -X POST "http://0.0.0.0:8185/v1/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "prompt": "",
        "prompt_token_ids": [[123, 456, 789], [987, 654, 321]],
        "max_tokens": 10
    }'

主要改动

  • fastdeploy/entrypoints/openai/serving_completion.py:新增一段对 prompt_token_ids 批量推理的处理,且优先级高于 prompt 字段
  • fastdeploy/input/ernie_processor.py:重构了对于传 prompt 字段的处理逻辑,如果该条 prompt 为 list 则直接写入 request.prompt_token_ids,否则为 str 时经过 tokenization 后再写入 request.prompt_token_ids。同时修改了 process_request 和 process_request_dict 方法
  • fastdeploy/input/text_processor.py:同上
  • test/ci_use/EB_Lite/test_EB_Lite_serving.py:新增对于 prompt 字段直传 token ids 以及 prompt/prompt_token_ids 字段批量推理的测试用例
  • test/ci_use/Qwen2-7B-Instruct_serving/test_Qwen2-7B-Instruct_serving.py:同上

Copy link

paddle-bot bot commented Aug 11, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Aug 11, 2025
@codecov-commenter
Copy link

codecov-commenter commented Aug 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@808b548). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #3311   +/-   ##
==========================================
  Coverage           ?   20.54%           
==========================================
  Files              ?        4           
  Lines              ?       73           
  Branches           ?       19           
==========================================
  Hits               ?       15           
  Misses             ?       54           
  Partials           ?        4           
Flag Coverage Δ
diff 20.54% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@liyonghua0910 liyonghua0910 merged commit 8829724 into PaddlePaddle:develop Aug 29, 2025
43 of 49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants