-
Notifications
You must be signed in to change notification settings - Fork 53
fix(llma): streaming providers with tool calls #319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(llma): streaming providers with tool calls #319
Conversation
Resolved conflicts by: - Keeping both StreamingEventData approach and new sanitization imports - Applying sanitization to formatted inputs before passing to StreamingEventData - Ensuring privacy mode and special token fields are handled by capture_streaming_event
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
17 files reviewed, 3 comments
carlos-marchal-ph
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the direction of this PR. One of my first thoughts when I started working on this repo was that the code was more complex and repetitive than probably needed. This starts addressing this in a reasonable way.
The PR itself looks good to me, but when testing it locally I ran into some issues with OpenAI. I tried reproducing them in master, and the first one is also there:
- OpenAI Responses streaming has 0 token count and is missing the assistant response after a tool call
- OpenAI Chat completions is missing the assistant response after a tool call
Other than that I have some more comments on the direction we want to head towards, I'll add them as a separate comment so we can discuss. They are non blocking as they probably belong on separate PRs.
|
In terms of strategy moving forward, I still think there's quite a bit of room for improvement. I don't think it belongs in this PR, but I'm just writing it out to sync on it. There is still quite a bit of unnecessary code repetition in the repo. I think some of these utilities could probably be shared with the non-streaming implementation. The data types are the first that come to mind, but I'm sure there are many transformations that could be reused. There's also still some code that's repeated in the sync and async implementations, such as the core event listening loop. Maybe we would benefit from having a class handling the entire event loop, which we could reuse across implementations, and which could hold some data that we are currently passing around every time. More generally, I think some of these utilities should probably be decomposed into smaller classes in smaller files, with a clearer separation of concerns. The current approach of exporting a bunch of functions from a single file is not helping with readability or testability. Again, these are all unordered thoughts, which I'm sure you've also had at some point. We can tackle this in other PRs down the road as we work on other stuff. Just wanted to write this down to see if you agree or if you think otherwise on some of these points. |
Completely agreed! If you have an opportunity to take steps in those directions, like I did in this PR, I will gladly review those PRs :D |
|
The LLM returns the tool call, but the handling of the tool call (and so the response with the weather) is not sent to LLMA unless you create a span event, or add it to the input of the next LLM call. |
carlos-marchal-ph
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha, missing context on my end then. Any idea about the 0 input 0 output tokens thing? In any case since it's happening on master too, it's probably unrelated to this PR, so approving 👏
|
Fixed the Responses API streaming tokens, nice catch! |
There are two things happening in this PR:
The fixes are the following:
$ai_tools$ai_output_choices$ai_tools$ai_output_choices$ai_output_choices