-
Notifications
You must be signed in to change notification settings - Fork 674
fix: no more multiple finish reasons in stream #4154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 4 commits
2a029f8
6f2138e
29351af
2cad55c
e8a02ab
c214ba2
46cd577
25013b6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,6 +13,7 @@ use dynamo_parsers::tool_calling::{ | |
| }; | ||
| use dynamo_runtime::protocols::annotated::Annotated; | ||
| use futures::{Stream, StreamExt}; | ||
| use std::collections::HashMap; | ||
|
|
||
| use crate::utils::{MarkerMatcher, MatchResult}; | ||
|
|
||
|
|
@@ -509,7 +510,7 @@ impl JailedStream { | |
| last_annotated_event.clone(), | ||
| last_annotated_comment.clone(), | ||
| ); | ||
| let responses = self.emit_choice_emissions(tool_content_emissions, chat_response, preserved_metadata); | ||
| let responses = self.emit_choice_emissions(tool_content_emissions.clone(), chat_response, preserved_metadata); | ||
| for emitted_response in responses { | ||
| yield emitted_response; | ||
| } | ||
|
|
@@ -709,14 +710,15 @@ impl JailedStream { | |
| .collect(); | ||
|
|
||
| // Create choice with tool calls | ||
| return create_choice_stream( | ||
| let choice = create_choice_stream( | ||
| choice_index, | ||
| Some(Role::Assistant), | ||
| normal_text.as_deref().unwrap_or(""), | ||
| Some(tool_call_chunks), | ||
| Some(FinishReason::ToolCalls), | ||
| None, | ||
| None, | ||
| ); | ||
| return choice; | ||
| } | ||
|
|
||
| // No tool calls found or parsing failed, return content choice | ||
|
|
@@ -745,6 +747,44 @@ impl JailedStream { | |
| } | ||
| false | ||
| } | ||
|
|
||
| /// Post-processor that sets finish_reason to ToolCalls when tool calls were emitted | ||
| /// This should be called after apply() to fix the finish_reason for tool call chunks | ||
| pub fn fix_finish_reason<S>( | ||
| input_stream: S, | ||
| ) -> impl Stream<Item = Annotated<NvCreateChatCompletionStreamResponse>> + Send | ||
| where | ||
| S: Stream<Item = Annotated<NvCreateChatCompletionStreamResponse>> + Send + 'static, | ||
| { | ||
| stream! { | ||
| tokio::pin!(input_stream); | ||
| let mut has_tool_calls_per_choice: HashMap<u32, bool> = HashMap::new(); | ||
|
|
||
| while let Some(mut response) = input_stream.next().await { | ||
| // Track if any choice emitted tool calls | ||
| if let Some(ref data) = response.data { | ||
| for choice in &data.choices { | ||
| if choice.delta.tool_calls.is_some() { | ||
| has_tool_calls_per_choice.insert(choice.index, true); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // If this chunk has finish_reason and the choice had tool calls, override to ToolCalls | ||
| if let Some(ref mut data) = response.data { | ||
| for choice in &mut data.choices { | ||
| if choice.finish_reason.is_some() | ||
| && has_tool_calls_per_choice.get(&choice.index).copied().unwrap_or(false) | ||
| { | ||
| choice.finish_reason = Some(FinishReason::ToolCalls); | ||
| } | ||
| } | ||
| } | ||
|
Comment on lines
790
to
799
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The vLLM logic appears stricter than this: https://github.com/vllm-project/vllm/blob/a404e2c0f1bf100d28453a5a2ab7bd2a29d9aa31/vllm/entrypoints/openai/serving_chat.py#L1515-L1519 For example here we check if there is any finish_reason set, but vllm explicitly checks if the finish reason is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For our case, it is correct. Jail stream in our case ensures that we get complete tool calls, and it is safe to overwrite the finish reason. For cases like length, I also thought about it, in that case tool calls will not be populated and hence we will not hit the overwrite condition. Rest, request.tool_choice condition is itself handled by jail stream post processor. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should add a small test case for length stop reason and test when tool call is parsed before max length is reached, when max length is reached and tool isn't finished parsing, and when max length is reached when tool is finished parsing? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we don't have those kind of chunks right now. May be part of other PR There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added test @elyasmnvidian |
||
|
|
||
| yield response; | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /// Builder for configuring a JailedStream | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if jailstream failed or if there was no tool parsed? Does this function get skipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no not skipped.