Skip to content

Conversation

kxz2002
Copy link
Contributor

@kxz2002 kxz2002 commented Sep 25, 2025

Support adding the parameter n to the request to retrieve multiple model responses.

Copy link

paddle-bot bot commented Sep 25, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Sep 25, 2025
@CLAassistant
Copy link

CLAassistant commented Sep 26, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ gzy19990617
✅ kxz2002
❌ LiqinruiG
You have signed the CLA already but the status is still pending? Let us recheck it.

request["prompt_token_ids_len"] = len(request["prompt_token_ids"])
input_ids_len = request["prompt_token_ids_len"]
request["max_tokens"] = min(self.max_model_len - input_ids_len, request.get("max_tokens"))
if request.get("reasoning_max_tokens", None) is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里reasoning_max_tokens的逻辑去掉吧

chunk_object_type: str = "chat.completion.chunk"
first_iteration = True
previous_num_tokens = 0
n_param = request.n if request.n is not None else 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接用num_choices就可以,不用再加一个n_param,

first_iteration[idx] = False

output = res["outputs"]
reasoning_content = output["reasoning_content"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这行删掉


delta_message = DeltaMessage(
reasoning_content="",
reasoning_content=reasoning_content,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这行也改成reasoning_content=""

prompt_tokens_details=PromptTokenUsageInfo(cached_tokens=final_res.get("num_cached_tokens", 0)),
prompt_tokens_details=PromptTokenUsageInfo(cached_tokens=sum(num_cached_tokens)),
)
work_process_metrics.e2e_request_latency.observe(time.time() - final_res["metrics"]["request_start_time"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这行挪到循环里面就行,不用记latency

request_prompts = request_prompt_ids

num_choices = len(request_prompts)
num_choices = len(request_prompts) * request.n
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request.n需要判空,下面的n_param = current_req_dict.get("n", 1)可以挪到上面来,用n_param去乘

try:
for idx, prompt in enumerate(request_prompts):
request_id_idx = f"{request_id}-{idx}"
request_id_idx = f"{request_id}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这行可以删掉

for idx, prompt in enumerate(request_prompts):
request_id_idx = f"{request_id}-{idx}"
request_id_idx = f"{request_id}"
current_req_dict = request.to_dict_for_infer(request_id_idx, prompt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里参数直接传request_id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants