Open
Conversation
e98aadd to
8022743
Compare
sauhardjain
reviewed
Feb 25, 2026
sauhardjain
reviewed
Feb 25, 2026
8022743 to
598f448
Compare
lucyliulee
reviewed
Feb 26, 2026
598f448 to
08e8a93
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for all 3 issues found in the latest run.
- ✅ Fixed: Benchmark script uses removed async-with stream pattern
- Updated
stream_turnto iterate directly overprovider.chat(...)withasync for, matching the stream interface that only implements__aiter__.
- Updated
- ✅ Fixed: Message identity only considers first tool call
- Message identities now include all assistant tool calls and the Realtime diff path expands multi-tool-call assistant messages into per-call items so no tool calls are dropped.
- ✅ Fixed: Double normalization of tools in LlmAgent+LlmProvider pipeline
- Added a fast-path tool resolver in
LlmProviderthat reuses already-normalizedFunctionToolinputs and only calls_normalize_toolswhen needed.
- Added a fast-path tool resolver in
Or push these changes by commenting:
@cursor push 471fcca35d
Preview (471fcca35d)
diff --git a/line/llm_agent/provider.py b/line/llm_agent/provider.py
--- a/line/llm_agent/provider.py
+++ b/line/llm_agent/provider.py
@@ -14,7 +14,7 @@
from typing import Any, List, Optional, Protocol, Tuple, runtime_checkable
from line.llm_agent.config import LlmConfig, _normalize_config
-from line.llm_agent.tools.utils import _normalize_tools
+from line.llm_agent.tools.utils import FunctionTool, _normalize_tools
@dataclass
@@ -105,9 +105,8 @@
):
self._model = model
normalized_config = _normalize_config(config or LlmConfig())
- normalized_tools, _ = _normalize_tools(tools, model=model) if tools else (None, None)
self._config = normalized_config
- self._tools = normalized_tools or []
+ self._tools = _resolve_tools(tools, model=model)
use_realtime = backend == "realtime" or (backend is None and _is_realtime_model(model))
use_websocket = backend == "websocket" or (backend is None and _is_websocket_model(model))
@@ -140,7 +139,7 @@
def chat(self, messages, tools=None, config=None, **kwargs):
cfg = _normalize_config(config) if config else self._config
- effective_tools = _normalize_tools(tools, model=self._model)[0] if tools else self._tools
+ effective_tools = _resolve_tools(tools, model=self._model) if tools else self._tools
return self._backend.chat(messages, effective_tools, config=cfg, **kwargs)
async def warmup(self, config=None):
@@ -199,18 +198,29 @@
return lower.startswith("gpt-5.2") or lower.startswith("gpt5.2")
+def _resolve_tools(tools: Optional[List[Any]], model: str) -> List[FunctionTool]:
+ """Resolve tools to FunctionTools, avoiding no-op re-normalization."""
+ if not tools:
+ return []
+ if all(isinstance(tool, FunctionTool) for tool in tools):
+ return list(tools)
+ return _normalize_tools(tools, model=model)[0]
+
+
def _message_identity(msg: Message) -> tuple:
"""Compute an identity fingerprint for a single Message.
Used by both WebSocket providers for divergence detection / diff-sync.
- For assistant messages with tool calls, identity is derived from the
- *first* tool call (mirrors how the server tracks multi-tool-call turns
- as a single logical unit).
+ For assistant messages with tool calls, identity includes all tool calls
+ so divergence checks detect changes to any call in the turn.
"""
if msg.tool_calls:
- tc = msg.tool_calls[0]
- return ("assistant_tool_call", tc.name, tc.arguments, tc.id)
+ if len(msg.tool_calls) == 1:
+ tc = msg.tool_calls[0]
+ return ("assistant_tool_call", tc.name, tc.arguments, tc.id)
+ tool_calls_key = tuple((tc.name, tc.arguments, tc.id) for tc in msg.tool_calls)
+ return ("assistant_tool_calls", tool_calls_key)
return (msg.role, msg.content or "", msg.tool_call_id or "", msg.name or "")
diff --git a/line/llm_agent/realtime_provider.py b/line/llm_agent/realtime_provider.py
--- a/line/llm_agent/realtime_provider.py
+++ b/line/llm_agent/realtime_provider.py
@@ -399,11 +399,8 @@
def _message_to_item(msg: Message) -> Dict[str, Any]:
"""Convert a Message to a Realtime API conversation item dict.
- Note: for assistant messages with multiple tool calls, only the first
- tool call is converted. The Realtime API represents each tool call as a
- separate conversation item, but the diff algorithm tracks identity at the
- message level. Handling multi-tool-call expansion here would require
- reworking the diff model.
+ Assistant tool-call messages must contain exactly one tool call; callers
+ are responsible for expanding multi-tool-call turns into separate messages.
"""
if msg.role == "user":
return {
@@ -414,13 +411,8 @@
if msg.role == "assistant":
if msg.tool_calls:
- if len(msg.tool_calls) > 1:
- logger.warning(
- "Realtime API: assistant message has %d tool calls but only "
- "the first is converted (dropping %s)",
- len(msg.tool_calls),
- [tc.name for tc in msg.tool_calls[1:]],
- )
+ if len(msg.tool_calls) != 1:
+ raise ValueError("Assistant tool-call message must contain exactly one tool call")
tc = msg.tool_calls[0]
return {
"type": "function_call",
@@ -464,7 +456,19 @@
if msg.role == "system":
system_parts.append(msg.content or "")
else:
- non_system.append(msg)
+ if msg.role == "assistant" and msg.tool_calls and len(msg.tool_calls) > 1:
+ for tc in msg.tool_calls:
+ non_system.append(
+ Message(
+ role="assistant",
+ content=msg.content,
+ tool_calls=[tc],
+ tool_call_id=msg.tool_call_id,
+ name=msg.name,
+ )
+ )
+ else:
+ non_system.append(msg)
desired_instructions = "\n\n".join(system_parts) if system_parts else None
diff --git a/line/llm_agent/scripts/bench_latency.py b/line/llm_agent/scripts/bench_latency.py
--- a/line/llm_agent/scripts/bench_latency.py
+++ b/line/llm_agent/scripts/bench_latency.py
@@ -164,12 +164,11 @@
ttft = None
text_parts: list[str] = []
- async with provider.chat(messages, config=config) as stream:
- async for chunk in stream:
- if chunk.text:
- if ttft is None:
- ttft = (time.perf_counter() - t0) * 1000
- text_parts.append(chunk.text)
+ async for chunk in provider.chat(messages, config=config):
+ if chunk.text:
+ if ttft is None:
+ ttft = (time.perf_counter() - t0) * 1000
+ text_parts.append(chunk.text)
total = (time.perf_counter() - t0) * 1000
return TurnResult(
diff --git a/line/llm_agent/websocket_provider.py b/line/llm_agent/websocket_provider.py
--- a/line/llm_agent/websocket_provider.py
+++ b/line/llm_agent/websocket_provider.py
@@ -447,19 +447,26 @@
def _extract_model_output_identity(response: Dict[str, Any]) -> Optional[tuple]:
"""Derive a single message-level identity from a Responses API output.
- Mirrors ``_message_identity``: if the model produced tool calls we key
- on the first one; otherwise we key on the full text.
+ Mirrors ``_message_identity``: single-tool-call outputs use a compact key,
+ while multi-tool-call outputs include every call in order.
"""
output_items = response.get("output", [])
function_calls = [i for i in output_items if i.get("type") == "function_call"]
if function_calls:
- fc = function_calls[0]
+ if len(function_calls) == 1:
+ fc = function_calls[0]
+ return (
+ "assistant_tool_call",
+ fc.get("name", ""),
+ fc.get("arguments", ""),
+ fc.get("call_id", ""),
+ )
return (
- "assistant_tool_call",
- fc.get("name", ""),
- fc.get("arguments", ""),
- fc.get("call_id", ""),
+ "assistant_tool_calls",
+ tuple(
+ (fc.get("name", ""), fc.get("arguments", ""), fc.get("call_id", "")) for fc in function_calls
+ ),
)
# Concatenate text across all message output items.08e8a93 to
e97ad49
Compare
e97ad49 to
dc4c1a9
Compare
dc4c1a9 to
67069c3
Compare
67069c3 to
149738e
Compare
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
149738e to
7a2463b
Compare
0ad8d18 to
80a08e5
Compare
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3.9 requires async primitives (lock/queues) to be initialized inside the context of a loop. The loop isn't available at initialization time. Fix: lazy initialize
3a53537 to
a8f738f
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: x
- x
Or push these changes by commenting:
@cursor push d109cf302a
Preview (d109cf302a)
diff --git a/line/llm_agent/llm_agent.py b/line/llm_agent/llm_agent.py
--- a/line/llm_agent/llm_agent.py
+++ b/line/llm_agent/llm_agent.py
@@ -311,7 +311,7 @@
stream = self._llm.chat(
messages,
- tools or None,
+ tools,
config=config,
**chat_kwargs,
)
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

What does this PR do?
Websocket APIs are noticeabley faster for certain models. Most notably,
gpt-realtime-1.5andgpt-5.2Unfortunately:
Fortunately, it's straightforwardIt's not super straightforward to add support for both, but I've done it.We hide the choice of implementation behind the facade of
LLmProvider, so it's seamless from the developer PoVThis is a pretty substantial PR, so I've split it into individual commits:
Type of change
Testing
Unit tests + "real" provider tests
Checklist
make formatNote
High Risk
Introduces new WebSocket/Reatltime LLM backends and refactors the provider/streaming interface, which can affect core agent response generation, tool-calling, and connection lifecycle behavior across models.
Overview
Adds a new
LlmProviderfacade that routes between HTTP (LiteLLM), OpenAI Realtime WS, and OpenAI Responses WS backends based on model name, with an HTTP fallback when WebSocket mode can’t support certainLlmConfigfields.Refactors
LlmAgentand example/test scripts to use the new provider API (async-iterablechat()withoutasync with), adds providerwarmup()onCallStarted, and centralizes tool normalization/merging (including native vs fallbackweb_search) intools.utils.Introduces substantial new WebSocket infrastructure: shared WS stream utilities (
stream.py), a diff-sync Realtime provider (realtime_provider.py), a Responses WS provider with divergence handling (websocket_provider.py), updated OpenAI tool schema conversion for WS APIs, plus new latency/verification scripts and expanded unit tests for routing, warmup, and tool/web-search behavior.Written by Cursor Bugbot for commit 4e0603c. This will update automatically on new commits. Configure here.