-
Notifications
You must be signed in to change notification settings - Fork 2.8k
AGT-2182: Add inference bargein and examples #4771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Chenghao Mou <[email protected]> Co-authored-by: Théo Monnom <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| except Exception as e: | ||
| raise APIError(f"error during interruption prediction: {e}") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 InterruptionHttpStream.predict double-wraps APIError exceptions
When an error occurs inside the async with self._session.post(...) response processing (e.g., resp.raise_for_status() fails or JSON parsing fails), the inner except Exception at line 744 catches it and raises an APIError. This APIError then propagates out of the async with block and the outer try. Since APIError is not asyncio.TimeoutError or aiohttp.ClientError, it is NOT caught by line 747, but IS caught by the generic except Exception at line 749, which wraps it in yet another APIError.
Root Cause and Impact
The exception handling structure has three nested layers:
- Inner
try/except Exception(line 723-746) - catches response processing errors, raisesAPIError - Outer
except (asyncio.TimeoutError, aiohttp.ClientError)(line 747) - catches connection/timeout errors - Outer
except Exception(line 749) - intended as a catch-all, but also catches theAPIErrorfrom step 1
Since APIError(Exception) is a subclass of Exception, the APIError raised at line 746 escapes the async with block and hits the outer except Exception at line 749. The result is a double-wrapped error message like:
"error during interruption prediction: error during interruption prediction: 404 Not Found"
Additionally, the body parameter from the inner APIError (which contains the response text for debugging) is lost in the re-wrapping.
Impact: Confusing error messages in logs and loss of the response body diagnostic information when interruption prediction HTTP calls fail.
| except Exception as e: | |
| raise APIError(f"error during interruption prediction: {e}") from e | |
| except APIError: | |
| raise | |
| except Exception as e: | |
| raise APIError(f"error during interruption prediction: {e}") from e |
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| interruption_mode: NotGivenOr[Literal["adaptive", "vad", False]] = NOT_GIVEN | ||
| if allow_interruptions is False: | ||
| interruption_mode = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 allow_interruptions=True (explicit) is silently ignored in TurnHandlingConfig.migrate
When a user explicitly passes allow_interruptions=True to Agent.__init__ or AgentTask.__init__, the TurnHandlingConfig.migrate method only checks if allow_interruptions is False: (line 141). When allow_interruptions=True, the interruption_mode stays as NOT_GIVEN, which means the InterruptionConfig.mode defaults to NOT_GIVEN. Later in Agent.__init__ at agent.py:95, is_given(turn_handling.interruption_cfg.mode) returns False, so self._allow_interruptions stays NOT_GIVEN — the explicit True is lost.
Detailed Explanation
The old API allowed allow_interruptions=True to be explicitly set on an Agent, which would override the session's default. With the new migration code:
Agent.__init__callsTurnHandlingConfig.migrate(allow_interruptions=True)- In
migrate,allow_interruptions is False→False, sointerruption_modestaysNOT_GIVEN InterruptionConfigis created withmode=NOT_GIVEN- Back in
Agent.__init__:is_given(turn_handling.interruption_cfg.mode)→False self._allow_interruptionsstaysNOT_GIVENinstead of being set toTrue
This means an Agent(allow_interruptions=True) no longer overrides a session's allow_interruptions=False. The fix should also set interruption_mode when allow_interruptions is True.
Impact: Agents that explicitly set allow_interruptions=True to override a session-level allow_interruptions=False will no longer work correctly — the session's setting will take precedence.
| interruption_mode: NotGivenOr[Literal["adaptive", "vad", False]] = NOT_GIVEN | |
| if allow_interruptions is False: | |
| interruption_mode = False | |
| interruption_mode: NotGivenOr[Literal["adaptive", "vad", False]] = NOT_GIVEN | |
| if is_given(allow_interruptions): | |
| if allow_interruptions is False: | |
| interruption_mode = False | |
| else: | |
| interruption_mode = "adaptive" |
Was this helpful? React with 👍 or 👎 to provide feedback.
| async def _send_task() -> None: | ||
| nonlocal overlap_speech_started | ||
| nonlocal cache | ||
| async for data in data_chan: | ||
| if self._overlap_speech_started_at is None: | ||
| continue | ||
| resp = await self.predict(data) | ||
| created_at = resp["created_at"] | ||
| cache[created_at] = entry = InterruptionCacheEntry( | ||
| created_at=created_at, | ||
| total_duration=(time.perf_counter_ns() - created_at) / 1e9, | ||
| speech_input=data, | ||
| detection_delay=time.time() - self._overlap_speech_started_at, | ||
| probabilities=resp["probabilities"], | ||
| prediction_duration=resp["prediction_duration"], | ||
| is_interruption=resp["is_bargein"], | ||
| ) | ||
| if overlap_speech_started and entry.is_interruption: | ||
| logger.debug("user interruption detected") | ||
| if self._user_speech_span: | ||
| self._update_user_speech_span(self._user_speech_span, entry) | ||
| self._user_speech_span = None | ||
| ev = InterruptionEvent( | ||
| type="user_interruption_detected", | ||
| timestamp=time.time(), | ||
| overlap_speech_started_at=self._overlap_speech_started_at, | ||
| is_interruption=entry.is_interruption, | ||
| speech_input=entry.speech_input, | ||
| probabilities=entry.probabilities, | ||
| total_duration=entry.get_total_duration(), | ||
| prediction_duration=entry.get_prediction_duration(), | ||
| detection_delay=entry.get_detection_delay(), | ||
| probability=entry.get_probability(), | ||
| ) | ||
| self._event_ch.send_nowait(ev) | ||
| self._model.emit("user_interruption_detected", ev) | ||
| overlap_speech_started = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Shared mutable overlap_speech_started between concurrent tasks in InterruptionHttpStream without synchronization
In InterruptionHttpStream._run(), the overlap_speech_started boolean is shared between _forward_data and _send_task via nonlocal. _forward_data writes to it (lines 566, 590, 631, 699) and _send_task reads it (line 680). Since _send_task contains await self.predict(data) (line 669), the event loop can switch between the two tasks between the await and the subsequent read of overlap_speech_started at line 680.
Root Cause
The specific scenario is:
_forward_datasetsoverlap_speech_started = True(line 590)_send_taskpicks up data and callsawait self.predict(data)(line 669)- While
predictis awaiting,_forward_dataprocesses an_OverlapSpeechEndedSentineland setsoverlap_speech_started = False(line 631), emitting auser_non_interruption_detectedevent predictreturns withis_interruption=True_send_taskchecksoverlap_speech_started(line 680) — it's nowFalse, so the interruption is silently dropped
This is a race condition that could cause missed interruption detections. However, since this is async (not threaded) and the window is narrow (only during the HTTP request), the practical impact is limited to edge cases where overlap speech ends exactly during an in-flight prediction.
Impact: Occasional missed interruption detections when overlap speech ends during an in-flight HTTP prediction request.
Was this helpful? React with 👍 or 👎 to provide feedback.
| endpointing_kwargs = {} | ||
| # allow not given values for agent to inherit from session | ||
| endpointing_kwargs["min_delay"] = min_endpointing_delay | ||
| endpointing_kwargs["max_delay"] = max_endpointing_delay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 endpointing_kwargs dict is always non-empty, overriding EndpointingConfig defaults with NOT_GIVEN
In TurnHandlingConfig.migrate(), endpointing_kwargs always gets two keys set (lines 146-147), so the if endpointing_kwargs: check at line 166 is always True. When min_endpointing_delay and max_endpointing_delay are both NOT_GIVEN, this creates EndpointingConfig(min_delay=NOT_GIVEN, max_delay=NOT_GIVEN), which overrides the class defaults of 0.5 and 3.0.
Root Cause
The EndpointingConfig class defines:
min_delay: NotGivenOr[float] = 0.5
max_delay: NotGivenOr[float] = 3.0But migrate() always passes these explicitly:
endpointing_kwargs["min_delay"] = min_endpointing_delay # could be NOT_GIVEN
endpointing_kwargs["max_delay"] = max_endpointing_delay # could be NOT_GIVENThis means EndpointingConfig(min_delay=NOT_GIVEN, max_delay=NOT_GIVEN) is created, which sets the fields to NOT_GIVEN instead of using the defaults 0.5 and 3.0.
For AgentSession, this is mitigated because the caller already provides fallback defaults (0.5 and 3.0). For Agent and AgentTask, NOT_GIVEN is intentional (agents inherit from session). So the practical impact is limited, but the endpointing_kwargs dict check is misleading — it will never be empty.
Impact: Low — the callers already handle this correctly, but the code is misleading.
Was this helpful? React with 👍 or 👎 to provide feedback.
No description provided.