Skip to content

Conversation

@Radu-Raicea
Copy link
Member

@Radu-Raicea Radu-Raicea commented Aug 29, 2025

There are two things happening in this PR:

  1. Multiple fixes related to streaming providers and tool calls.
  2. A significant refactoring that applies DRY principles to the providers by introducing a converter pattern for providers. The refactoring is not perfect, but this is a big step in a good direction, and we should continue improving on it.

The fixes are the following:

  • Anthropic's streaming implementation was missing $ai_tools
  • Anthropic's streaming implementation was missing tool calls in $ai_output_choices
  • Gemini's streaming and non-streaming implementations were not sending text messages in the correct parsable format
  • Gemini's streaming implementation was missing $ai_tools
  • Gemini's streaming implementation was missing tool calls in $ai_output_choices
  • OpenAI Chat Completions' streaming implementation was missing tool calls in $ai_output_choices

@Radu-Raicea Radu-Raicea changed the title Fix/llma streaming providers with tool calls fix(llma): streaming providers with tool calls Sep 2, 2025
Resolved conflicts by:
- Keeping both StreamingEventData approach and new sanitization imports
- Applying sanitization to formatted inputs before passing to StreamingEventData
- Ensuring privacy mode and special token fields are handled by capture_streaming_event
@Radu-Raicea Radu-Raicea marked this pull request as ready for review September 2, 2025 17:55
@Radu-Raicea Radu-Raicea requested a review from a team September 2, 2025 17:56
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

17 files reviewed, 3 comments

Edit Code Review Bot Settings | Greptile

Copy link
Contributor

@carlos-marchal-ph carlos-marchal-ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the direction of this PR. One of my first thoughts when I started working on this repo was that the code was more complex and repetitive than probably needed. This starts addressing this in a reasonable way.

The PR itself looks good to me, but when testing it locally I ran into some issues with OpenAI. I tried reproducing them in master, and the first one is also there:

  1. OpenAI Responses streaming has 0 token count and is missing the assistant response after a tool call
Screenshot 2025-09-03 at 19 18 02 Screenshot 2025-09-03 at 19 18 14
  1. OpenAI Chat completions is missing the assistant response after a tool call
Screenshot 2025-09-03 at 19 15 59 Screenshot 2025-09-03 at 19 17 24

Other than that I have some more comments on the direction we want to head towards, I'll add them as a separate comment so we can discuss. They are non blocking as they probably belong on separate PRs.

@carlos-marchal-ph
Copy link
Contributor

In terms of strategy moving forward, I still think there's quite a bit of room for improvement. I don't think it belongs in this PR, but I'm just writing it out to sync on it.

There is still quite a bit of unnecessary code repetition in the repo. I think some of these utilities could probably be shared with the non-streaming implementation. The data types are the first that come to mind, but I'm sure there are many transformations that could be reused.

There's also still some code that's repeated in the sync and async implementations, such as the core event listening loop. Maybe we would benefit from having a class handling the entire event loop, which we could reuse across implementations, and which could hold some data that we are currently passing around every time.

More generally, I think some of these utilities should probably be decomposed into smaller classes in smaller files, with a clearer separation of concerns. The current approach of exporting a bunch of functions from a single file is not helping with readability or testability.

Again, these are all unordered thoughts, which I'm sure you've also had at some point. We can tackle this in other PRs down the road as we work on other stuff. Just wanted to write this down to see if you agree or if you think otherwise on some of these points.

@Radu-Raicea
Copy link
Member Author

In terms of strategy moving forward, I still think there's quite a bit of room for improvement. I don't think it belongs in this PR, but I'm just writing it out to sync on it.

There is still quite a bit of unnecessary code repetition in the repo. I think some of these utilities could probably be shared with the non-streaming implementation. The data types are the first that come to mind, but I'm sure there are many transformations that could be reused.

There's also still some code that's repeated in the sync and async implementations, such as the core event listening loop. Maybe we would benefit from having a class handling the entire event loop, which we could reuse across implementations, and which could hold some data that we are currently passing around every time.

More generally, I think some of these utilities should probably be decomposed into smaller classes in smaller files, with a clearer separation of concerns. The current approach of exporting a bunch of functions from a single file is not helping with readability or testability.

Again, these are all unordered thoughts, which I'm sure you've also had at some point. We can tackle this in other PRs down the road as we work on other stuff. Just wanted to write this down to see if you agree or if you think otherwise on some of these points.

Completely agreed! If you have an opportunity to take steps in those directions, like I did in this PR, I will gladly review those PRs :D

@Radu-Raicea
Copy link
Member Author

@carlos-marchal-ph

The LLM returns the tool call, but the handling of the tool call (and so the response with the weather) is not sent to LLMA unless you create a span event, or add it to the input of the next LLM call.

Copy link
Contributor

@carlos-marchal-ph carlos-marchal-ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha, missing context on my end then. Any idea about the 0 input 0 output tokens thing? In any case since it's happening on master too, it's probably unrelated to this PR, so approving 👏

@Radu-Raicea
Copy link
Member Author

Fixed the Responses API streaming tokens, nice catch!

@Radu-Raicea Radu-Raicea enabled auto-merge (squash) September 3, 2025 18:31
@Radu-Raicea Radu-Raicea merged commit 08b11cb into master Sep 3, 2025
10 checks passed
@Radu-Raicea Radu-Raicea deleted the fix/llma-streaming-providers-with-tool-calls branch September 3, 2025 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants