Skip to content
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion lib/llm/src/preprocessor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -764,7 +764,10 @@ impl OpenAIPreprocessor {
let jail = JailedStream::builder()
.tool_call_parser(tool_call_parser)
.build();
jail.apply(stream)
let jailed_stream = jail.apply(stream);

// Post-process to set finish reason to ToolCalls for the last chunk if any tool calls were emitted
JailedStream::fix_finish_reason(jailed_stream)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if jailstream failed or if there was no tool parsed? Does this function get skipped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no not skipped.

}

// Motivation: Each transformation on the stream should be a separate step to allow for more flexibility
Expand Down
44 changes: 42 additions & 2 deletions lib/llm/src/protocols/openai/chat_completions/jail.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ use dynamo_parsers::tool_calling::{
};
use dynamo_runtime::protocols::annotated::Annotated;
use futures::{Stream, StreamExt};
use std::collections::HashMap;

use crate::utils::{MarkerMatcher, MatchResult};

Expand Down Expand Up @@ -709,14 +710,15 @@ impl JailedStream {
.collect();

// Create choice with tool calls
return create_choice_stream(
let choice = create_choice_stream(
choice_index,
Some(Role::Assistant),
normal_text.as_deref().unwrap_or(""),
Some(tool_call_chunks),
Some(FinishReason::ToolCalls),
None,
None,
);
return choice;
}

// No tool calls found or parsing failed, return content choice
Expand Down Expand Up @@ -745,6 +747,44 @@ impl JailedStream {
}
false
}

/// Post-processor that sets finish_reason to ToolCalls when tool calls were emitted
/// This should be called after apply() to fix the finish_reason for tool call chunks
pub fn fix_finish_reason<S>(
input_stream: S,
) -> impl Stream<Item = Annotated<NvCreateChatCompletionStreamResponse>> + Send
where
S: Stream<Item = Annotated<NvCreateChatCompletionStreamResponse>> + Send + 'static,
{
stream! {
tokio::pin!(input_stream);
let mut has_tool_calls_per_choice: HashMap<u32, bool> = HashMap::new();

while let Some(mut response) = input_stream.next().await {
// Track if any choice emitted tool calls
if let Some(ref data) = response.data {
for choice in &data.choices {
if choice.delta.tool_calls.is_some() {
has_tool_calls_per_choice.insert(choice.index, true);
}
}
}

// If this chunk has finish_reason and the choice had tool calls, override to ToolCalls
if let Some(ref mut data) = response.data {
for choice in &mut data.choices {
if choice.finish_reason.is_some()
&& has_tool_calls_per_choice.get(&choice.index).copied().unwrap_or(false)
{
choice.finish_reason = Some(FinishReason::ToolCalls);
}
}
}
Comment on lines 790 to 799
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vLLM logic appears stricter than this: https://github.com/vllm-project/vllm/blob/a404e2c0f1bf100d28453a5a2ab7bd2a29d9aa31/vllm/entrypoints/openai/serving_chat.py#L1515-L1519

            # In OpenAI's API, when a tool is called, the finish_reason is:
            # "tool_calls" for "auto" or "required" tool calls,
            # and "stop" for named tool calls.
            is_finish_reason_tool_calls = auto_tools_called or (
                request.tool_choice
                and request.tool_choice == "required"
                and output.finish_reason == "stop"
            )

For example here we check if there is any finish_reason set, but vllm explicitly checks if the finish reason is stop. Is there an edge case here where we may overwrite the finish_reason undesirably? What happens if it is length for example, and is that the correct behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our case, it is correct. Jail stream in our case ensures that we get complete tool calls, and it is safe to overwrite the finish reason.

For cases like length, I also thought about it, in that case tool calls will not be populated and hence we will not hit the overwrite condition.

Rest, request.tool_choice condition is itself handled by jail stream post processor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add a small test case for length stop reason and test when tool call is parsed before max length is reached, when max length is reached and tool isn't finished parsing, and when max length is reached when tool is finished parsing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have those kind of chunks right now. May be part of other PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added test @elyasmnvidian


yield response;
}
}
}
}

/// Builder for configuring a JailedStream
Expand Down
Loading
Loading