feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt #3953

zhongdaor-nv · 2025-10-29T07:11:41Z

Overview:

This PR enables the completion endpoint to accept arrays of prompts and generate multiple completions per prompt.

Details:

Add utility functions to handle prompt arrays (get_prompt_batch_size, extract_single_prompt)
Implement batch processing in HTTP handler with proper choice index remapping
Add validation for total choices (batch_size × n ≤ 128)
Generate unique request_id for each prompt to avoid conflicts
Add comprehensive tests for batch prompts and n parameter combinations
Maintain backward compatibility with single prompt requests

Where should the reviewer start?

lib/llm/src/protocols/openai/completions.rs - contains the new validation logic and utility functions
lib/llm/src/http/service/openai.rs - contains the batch processing implementation with choice index remapping

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Test Plan

Request Example:
curl localhost:8000/v1/completions -H "Content-Type: application/json" -d '{ "model": "Qwen/Qwen3-0.6B", "prompt": ["Say test 1", "Say test 2"], "max_tokens": 50,"temperature": 0.7,"n": 1}' | jq
Response:
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 617 100 504 100 113 1446 324 --:--:-- --:--:-- --:--:-- 1772 { "id": "cmpl-342716fb-9fbe-42bf-b874-48dd150c6bba-1", "choices": [ { "text": "234, 3214, 4321, 4123, 1234, 2413, 1324, 4312, 413", "index": 0, "finish_reason": "length" }, { "text": "015\nLet $T_{n}$ be the set of all possible expressions of the form $\\frac{a_n}{b_n} + \\frac{c_n}{d_n}$, where $a_n, b_n, c_n", "index": 1, "finish_reason": "length" } ], "created": 1762282615, "model": "Qwen/Qwen3-0.6B", "system_fingerprint": null, "object": "text_completion", "usage": { "prompt_tokens": 4, "completion_tokens": 50, "total_tokens": 54 } }

Summary by CodeRabbit

New Features

Added support for batch processing of multiple prompts in a single completion request.
Introduced validation to enforce a maximum limit of 128 total choices per batch operation.

Bug Fixes

Fixed handling of multi-element prompt arrays; previously returned server errors, now process successfully with valid responses.

Enable completion endpoint to accept arrays of prompts and generate n completions per prompt, matching vLLM behavior. - Add utility functions to handle prompt arrays (get_prompt_batch_size, extract_single_prompt) - Implement batch processing in HTTP handler with proper choice index remapping - Add validation for total choices (batch_size × n ≤ 128) - Generate unique request_id for each prompt to avoid conflicts - Add comprehensive tests for batch prompts and n parameter combinations - Maintain backward compatibility with single prompt requests Choice index formula matches vLLM: final_index = prompt_idx * n + choice_idx Example: 3 prompts with n=2 yields indices 0,1 (prompt0), 2,3 (prompt1), 4,5 (prompt2)

Signed-off-by: zhongdaor <[email protected]>

…eature-parity-testingllama-33

Signed-off-by: zhongdaor <[email protected]>

coderabbitai · 2025-10-30T06:07:52Z

Walkthrough

The pull request implements batch-aware handling for LLM completions by introducing detection logic that routes single-prompt and multi-prompt requests through dedicated code paths. Batch utilities extract and validate prompts, enforce a total choices limit, and support per-prompt choice remapping with streaming and annotation handling.

Changes

Cohort / File(s)	Summary
HTTP Service Batch Routing `lib/llm/src/http/service/openai.rs`	Splits completions handling into `completions_batch` (multi-prompt) and `completions_single` (single-prompt) functions, with the main `completions` function delegating based on batch size detection. Handles per-prompt request setup, choice index remapping, stream merging, and metrics collection.
Protocol Layer Batch Utilities `lib/llm/src/protocols/openai/completions.rs`	Adds `get_prompt_batch_size` and `extract_single_prompt` utilities for batch operations. Extends `NvCreateCompletionRequest` validation to check total choices via batch size. Implements `raw_prompt()` on `NvExtProvider` trait to expose prompt data when enabled.
Validation Enforcement `lib/llm/src/protocols/openai/validate.rs`	Introduces `MAX_TOTAL_CHOICES` constant (128) and `validate_total_choices` function to enforce upper bounds on batch × n product.
Test Coverage `lib/llm/tests/openai_completions.rs`, `tests/frontend/test_completion_mocker_engine.py`	Adds batch prompt utility tests covering batch sizing, per-prompt extraction, validation limits, and multi-prompt array handling. Updates Python test to expect success (200) instead of error (500) for multi-prompt arrays.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Batch routing logic in service layer with stream merging and choice remapping requires careful verification of index calculations and state handling
Cross-field validation integration in NvCreateCompletionRequest needs confirmation of interaction with existing validators
Test expectation shift from error to success for multi-prompt arrays should be verified against any related constraints or documentation
New trait method raw_prompt() on NvExtProvider should be reviewed for implementation consistency across all implementors

Poem

🐰 Hop, skip, and batch we go,
Multiple prompts in a row,
Choices remapped, streams all flowing,
Validations keeping totals glowing,
Completions batch—a hop-timal show! 🌟

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title Check	✅ Passed	The pull request title "feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt" directly aligns with the main objective and changes in the changeset. The changes across multiple files (HTTP service, validation logic, utility functions, and tests) all implement exactly what the title describes: batch-aware prompt handling that allows arrays of prompts and per-prompt completion generation. The title is specific enough to convey the primary feature being added, concise as a single sentence, and avoids vague terminology. The changeset is entirely focused on enabling this batch completion capability while preserving backward compatibility for single-prompt requests.
Description check	✅ Passed	The PR description follows the required template structure with all key sections: Overview, Details, Where should the reviewer start, Related Issues, and Test Plan with examples.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: zhongdaor <[email protected]>

lib/llm/src/http/service/openai.rs

Signed-off-by: zhongdaor <[email protected]>

…eature-parity-testingllama-33

zhongdaor-nv · 2025-11-04T19:08:39Z

updated examples/test plan in description

rmccorm4 · 2025-11-05T18:25:28Z

Syncing with main to pull in this change to hopefully fix all the failing deploy tests: https://github.com/ai-dynamo/dynamo/pull/4089/files

…eature-parity-testingllama-33

rmccorm4 · 2025-11-05T21:46:42Z

May need this one for deploy test failures: #4130

ryan-lempka

Would like to see a test for empty prompt array and make sure it's properly rejected. Otherwise LGTM!

lib/llm/tests/openai_completions.rs

Signed-off-by: zhongdaor <[email protected]>

…eature-parity-testingllama-33

Signed-off-by: zhongdaor <[email protected]>

…eature-parity-testingllama-33

ryan-lempka

LGTM

rmccorm4 · 2025-11-06T21:00:53Z

lib/llm/src/protocols/openai/completions.rs

+                // Fallback to empty string if index out of bounds
+                dynamo_async_openai::types::Prompt::String(String::new())


When would index exceed bounds? Should this be an error case? If so we can return Result<Prompt> and return Err for these out of bounds index cases, and then where we call extract_single_prompt we should also error out if we got an error instead of processing with an empty string or empty array right?

It actually won’t exceed the bounds. When I implemented this, I was treating it as a standalone module and added the bounds check for robustness. We can just let it error out if the index goes out of range.

Signed-off-by: zhongdaor <[email protected]>

zhongdaor-nv added 2 commits October 28, 2025 23:52

move validation to completions.rs

48c44c0

Signed-off-by: zhongdaor <[email protected]>

pull-request-size bot added the size/L label Oct 29, 2025

Merge branch 'main' into zhongdaor/dis-871-5581615-p0llm-nim-dynamo-f…

6ff7480

…eature-parity-testingllama-33

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 05:48 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 05:49 Inactive

fix test

cd1473b

Signed-off-by: zhongdaor <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 05:56 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 05:57 Inactive

pre-commit

0029d23

Signed-off-by: zhongdaor <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 05:59 Inactive

zhongdaor-nv marked this pull request as ready for review October 30, 2025 06:01

zhongdaor-nv requested review from a team as code owners October 30, 2025 06:01

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 06:02 Inactive

zhongdaor-nv changed the title ~~Zhongdaor/dis 871 5581615 p0llm nim dynamo feature parity testingllama 33~~ feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt Oct 30, 2025

github-actions bot added the feat label Oct 30, 2025

zhongdaor-nv requested review from rmccorm4 and ryan-lempka October 30, 2025 06:03

more test

a62eb95

Signed-off-by: zhongdaor <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 06:24 Inactive

pre-commit

22f34a9

Signed-off-by: zhongdaor <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 06:44 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 06:45 Inactive

rmccorm4 requested review from KrishnanPrash and ayushag-nv October 31, 2025 22:59

rmccorm4 reviewed Nov 4, 2025

View reviewed changes

lib/llm/src/http/service/openai.rs Outdated Show resolved Hide resolved

rmccorm4 requested a review from paulhendricks November 4, 2025 01:51

cargo clippy

5065082

Signed-off-by: zhongdaor <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB November 4, 2025 18:54 Inactive

Merge branch 'main' into zhongdaor/dis-871-5581615-p0llm-nim-dynamo-f…

50babed

…eature-parity-testingllama-33

copy-pr-bot bot temporarily deployed to GITLAB November 4, 2025 18:54 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 4, 2025 18:55 Inactive

Merge branch 'main' into zhongdaor/dis-871-5581615-p0llm-nim-dynamo-f…

7b6f185

…eature-parity-testingllama-33

copy-pr-bot bot temporarily deployed to GITLAB November 5, 2025 18:25 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 5, 2025 18:26 Inactive

ryan-lempka reviewed Nov 5, 2025

View reviewed changes

lib/llm/tests/openai_completions.rs Show resolved Hide resolved

add test for empty array

1f556d0

Signed-off-by: zhongdaor <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB November 6, 2025 07:40 Inactive

Merge branch 'main' into zhongdaor/dis-871-5581615-p0llm-nim-dynamo-f…

ab736f1

…eature-parity-testingllama-33

copy-pr-bot bot temporarily deployed to GITLAB November 6, 2025 07:41 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 6, 2025 07:46 Inactive

pre-commit

fb46e14

Signed-off-by: zhongdaor <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB November 6, 2025 07:57 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 6, 2025 07:58 Inactive

rmccorm4 requested a review from ryan-lempka November 6, 2025 20:45

Merge branch 'main' into zhongdaor/dis-871-5581615-p0llm-nim-dynamo-f…

07ec9da

…eature-parity-testingllama-33

copy-pr-bot bot temporarily deployed to GITLAB November 6, 2025 20:46 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 6, 2025 20:47 Inactive

ryan-lempka approved these changes Nov 6, 2025

View reviewed changes

rmccorm4 reviewed Nov 6, 2025

View reviewed changes

Remove redundant boundary check

95fe8f0

Signed-off-by: zhongdaor <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB November 7, 2025 08:35 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 7, 2025 08:36 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt #3953

feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt #3953

zhongdaor-nv commented Oct 29, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Oct 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

zhongdaor-nv commented Nov 4, 2025

Uh oh!

rmccorm4 commented Nov 5, 2025

Uh oh!

rmccorm4 commented Nov 5, 2025

Uh oh!

ryan-lempka left a comment

Uh oh!

Uh oh!

ryan-lempka left a comment

Uh oh!

rmccorm4 Nov 6, 2025

Uh oh!

zhongdaor-nv Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		// Fallback to empty string if index out of bounds
		dynamo_async_openai::types::Prompt::String(String::new())

feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt #3953

Are you sure you want to change the base?

feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt #3953

Conversation

zhongdaor-nv commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Test Plan

Summary by CodeRabbit

New Features

Bug Fixes

Uh oh!

coderabbitai bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

Uh oh!

zhongdaor-nv commented Nov 4, 2025

Uh oh!

rmccorm4 commented Nov 5, 2025

Uh oh!

rmccorm4 commented Nov 5, 2025

Uh oh!

ryan-lempka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ryan-lempka left a comment

Choose a reason for hiding this comment

Uh oh!

rmccorm4 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

zhongdaor-nv Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhongdaor-nv commented Oct 29, 2025 •

edited

Loading

coderabbitai bot commented Oct 30, 2025 •

edited

Loading