Skip to content

Conversation

@tillkolter
Copy link
Contributor

@tillkolter tillkolter commented Jan 25, 2026

Description

This PR refines Gemini Realtime interrupt handling by sending realtime_input text when an interrupt is active, while always keeping full user turns in content.

The goal is to preserve context fidelity while still triggering low‑latency interruption behavior.

In contrast to OpenAI's well defined message/event schema to support interruption, truncation and request cancelation, Gemini is falls short of providing an explicit interface to manage the state of the request. This solution tries to simulate the internal behavior of the OpenAI plugin to achieve similar behaviour for text interruptions.

Changes Made

  • Gate realtime_input text on an explicit interrupt flag instead of generation state, so interrupts remain responsive even after _done is set.
  • Always include complete turns in content (no stripping), so the model’s context stays consistent.
  • Introduce a lightweight interrupt flag to drive the “send realtime_input text” decision and clear it after use.

Pre-Review Checklist

  • Build passes: All builds (lint, typecheck, tests) pass locally
  • AI-generated code reviewed: Removed unnecessary comments and ensured code quality
  • Changes explained: All changes are properly documented and justified above
  • Scope appropriate: All changes relate to the PR title, or explanations provided for why they're included

Additional Notes

This keeps Gemini’s interruption responsiveness while avoiding the context drift caused by stripping user turns from content.

Summary by CodeRabbit

  • New Features

    • Real-time input now supports optional text alongside media chunks so user text can be forwarded during live sessions.
  • Bug Fixes

    • Improved interrupt handling to discard ongoing server output and resume cleanly after appropriate server signals.
    • Ensured user input accumulated during interrupts is sent reliably.
    • Prevented new response generations from starting while server output is being discarded.

✏️ Tip: You can customize this high-level summary in your review settings.

@changeset-bot
Copy link

changeset-bot bot commented Jan 25, 2026

⚠️ No Changeset found

Latest commit: 135305a

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link

coderabbitai bot commented Jan 25, 2026

📝 Walkthrough

Walkthrough

Adds interrupt-driven text input handling to RealtimeSession: introduces pendingInterruptText and discardServerOutput flags, emits realtime_input with accumulated user text on interrupt, gates server content while discarding, and prevents new generations from starting during discard until specific server signals arrive.

Changes

Cohort / File(s) Summary
Realtime interrupt & server-output gating
plugins/google/src/beta/realtime/realtime_api.ts
Added pendingInterruptText and discardServerOutput; interrupt sets flags and marks current generation done for preemption; updateChatCtx emits realtime_input with accumulated text when pending; incoming realtime_input accepts optional text; server-content (modelTurn, outputTranscription) processing is gated by discardServerOutput; isNewGeneration short-circuits when discarding; discardServerOutput reset only after turnComplete/generationComplete.

Sequence Diagram

sequenceDiagram
    participant User
    participant RealtimeSession
    participant Generation
    participant Server

    User->>RealtimeSession: Trigger interrupt
    activate RealtimeSession
    RealtimeSession->>RealtimeSession: Set pendingInterruptText\nSet discardServerOutput = true
    RealtimeSession->>Generation: Mark current generation done
    deactivate RealtimeSession

    Generation-->>RealtimeSession: Acknowledge done

    RealtimeSession->>RealtimeSession: Emit realtime_input with accumulated text

    Server->>RealtimeSession: Incoming realtime_input (optional text / media)
    activate RealtimeSession
    RealtimeSession->>RealtimeSession: Process realtime_input (accept text)
    deactivate RealtimeSession

    Server->>RealtimeSession: Server content (modelTurn / outputTranscription)
    activate RealtimeSession
    alt discardServerOutput is true
        RealtimeSession->>RealtimeSession: Skip processing/appending server content
    else
        RealtimeSession->>RealtimeSession: Process server content normally
    end
    deactivate RealtimeSession

    Server->>RealtimeSession: turnComplete / generationComplete
    RealtimeSession->>RealtimeSession: Reset discardServerOutput

    RealtimeSession->>Generation: Check isNewGeneration
    alt discardServerOutput is true
        Generation-->>RealtimeSession: Prevent new generation
    else
        Generation-->>RealtimeSession: Allow generation lifecycle
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • toubatbrian
  • theomonnom
  • lukasIO

Poem

🐰
I nibbled at the interrupt bell,
Held your words and kept them well,
Server quiet, then I leap—
Send the text you meant to keep,
A rabbit’s hop, precise and swell ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Support Gemini realtime text interruptions' clearly and specifically describes the main change in the pull request.
Description check ✅ Passed The PR description includes all required sections: Description, Changes Made, Pre-Review Checklist (completed), and Additional Notes. All sections are substantive and complete.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e153c5e and 135305a.

📒 Files selected for processing (1)
  • plugins/google/src/beta/realtime/realtime_api.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'

Files:

  • plugins/google/src/beta/realtime/realtime_api.ts
**/*.{ts,tsx}?(test|example|spec)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

When testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Files:

  • plugins/google/src/beta/realtime/realtime_api.ts
**/*.{ts,tsx}?(test|example)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Initialize logger before using any LLM functionality with initializeLogger({ pretty: true }) from '@livekit/agents'

Files:

  • plugins/google/src/beta/realtime/realtime_api.ts
🧬 Code graph analysis (1)
plugins/google/src/beta/realtime/realtime_api.ts (1)
plugins/openai/src/realtime/api_proto.ts (1)
  • Content (181-186)
🪛 Biome (2.1.2)
plugins/google/src/beta/realtime/realtime_api.ts

[error] 934-934: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🔇 Additional comments (6)
plugins/google/src/beta/realtime/realtime_api.ts (6)

413-414: LGTM!

The new flags pendingInterruptText and discardServerOutput are well-named and appropriately scoped as private instance variables for managing interrupt state.


573-592: LGTM!

The interrupt text handling logic is well-structured. The flag is correctly cleared only after text is actually sent (inside the if (text) block), which addresses the earlier review concern about maintaining interrupt responsiveness.


748-753: LGTM!

The interrupt logic correctly:

  1. Gates flag-setting on an active, non-done generation
  2. Sets both pendingInterruptText and discardServerOutput atomically before marking the generation done
  3. Always calls startUserActivity() to signal activity to Gemini

This effectively simulates OpenAI-style interruption behavior for Gemini as intended.


1329-1331: LGTM!

The discardOutput guard correctly filters out model output (modelTurn and outputTranscription) during interrupts while intentionally allowing inputTranscription through. This preserves the user's speech context even when discarding stale model responses.


1414-1422: Protocol assumption is documented but inherently unverified.

The comment on lines 1415-1416 correctly documents the assumption that Gemini emits turnComplete/generationComplete before any new generation content. This addresses the documentation aspect from the previous review.

However, if Gemini's protocol allows new generation content to arrive before the completion signal of an interrupted generation, that content would be silently dropped (due to isNewGeneration() returning false while discardServerOutput is true). Since Gemini lacks an explicit interrupt API, this edge case behavior may need monitoring in production.

Consider adding telemetry or debug logging when content is discarded to help identify if this edge case occurs:

   private isNewGeneration(response: types.LiveServerMessage) {
     if (this.discardServerOutput) {
+      if (LK_GOOGLE_DEBUG) {
+        this.#logger.debug('isNewGeneration: discarding due to discardServerOutput flag');
+      }
       return false;
     }

1577-1580: No action needed - toolCall handling is correct and already accounts for interrupt semantics.

The code does not discard toolCall messages during the interrupt window. While isNewGeneration() returns false when discardServerOutput is true (preventing startNewGeneration() from being called), the subsequent handleToolCall() at line 1046 is independent and executes unconditionally if a toolCall is present. Tool calls are processed within the current generation's functionChannel and are not discarded. The discardServerOutput flag only prevents new generation contexts from starting, not the processing of tool calls within an active generation. This is the correct behavior for interrupt semantics.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@tillkolter tillkolter changed the title Support Gemini realtime text interruptions feat: Support Gemini realtime text interruptions Jan 25, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@plugins/google/src/beta/realtime/realtime_api.ts`:
- Around line 573-592: The code clears this.pendingInterruptText unconditionally
even when no user text is found; change the logic in the block that iterates
over turns (types.Content) so you only clear this.pendingInterruptText after you
have actually sent at least one realtime_input via this.sendClientEvent.
Concretely, add a local boolean (e.g., sentRealtime) before the loop, set it to
true whenever you call this.sendClientEvent({ type: 'realtime_input', ... }),
and after the loop set this.pendingInterruptText = false only if sentRealtime is
true (leave it true otherwise) so future user text can still trigger
realtime_input.
- Around line 934-943: The switch case handling realtime_input declares const {
mediaChunks, activityStart, activityEnd, text } without a block which triggers
noSwitchDeclarations; wrap the case body in a block (add { ... } around the
existing statements) so the const lives inside a lexical block and keep the
existing awaits (session.sendRealtimeInput calls for mediaChunks, text, and
activityStart) unchanged; locate the case labeled realtime_input and enclose the
current lines starting with "const { mediaChunks, activityStart, activityEnd,
text } = msg.value;" through the activityStart await in a new { } block.
- Around line 1576-1578: The discardServerOutput flag can cause new-generation
content to be dropped if protocol ordering allows content for the next
generation to arrive before generationComplete/turnComplete; update the logic to
be robust by either (1) adding a clear explanatory comment near
discardServerOutput and the isNewGeneration() call documenting the protocol
ordering guarantee expected, or (2) proactively clearing discardServerOutput
when serverContent.interrupted is observed (in the same handler that checks
serverContent), or (3) modifying isNewGeneration() to treat a differing
generationId as a new generation even if discardServerOutput is true so
new-generation messages are accepted; locate and change behavior around
isNewGeneration(), discardServerOutput, and the serverContent handling to
implement one of these options.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3544bfa and 627ebb0.

📒 Files selected for processing (1)
  • plugins/google/src/beta/realtime/realtime_api.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'

Files:

  • plugins/google/src/beta/realtime/realtime_api.ts
**/*.{ts,tsx}?(test|example|spec)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

When testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Files:

  • plugins/google/src/beta/realtime/realtime_api.ts
**/*.{ts,tsx}?(test|example)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Initialize logger before using any LLM functionality with initializeLogger({ pretty: true }) from '@livekit/agents'

Files:

  • plugins/google/src/beta/realtime/realtime_api.ts
🧬 Code graph analysis (1)
plugins/google/src/beta/realtime/realtime_api.ts (2)
plugins/openai/src/realtime/api_proto.ts (1)
  • Content (181-186)
plugins/openai/src/realtime/realtime_model.ts (1)
  • session (305-307)
🪛 Biome (2.1.2)
plugins/google/src/beta/realtime/realtime_api.ts

[error] 934-934: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🔇 Additional comments (3)
plugins/google/src/beta/realtime/realtime_api.ts (3)

413-414: Clean state flags for interrupt handling.
The added flags make the interrupt/discard state explicit and easy to reason about.


748-752: Interrupt flow sequencing looks good.
Marking the current generation done before starting activity keeps preemption consistent.


1329-1420: Output-discard gating is solid.
Suppressing modelTurn/outputTranscription while discarding avoids stale playout and text leakage.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +934 to 943
const { mediaChunks, activityStart, activityEnd, text } = msg.value;
if (mediaChunks) {
for (const mediaChunk of mediaChunks) {
await session.sendRealtimeInput({ media: mediaChunk });
}
}
if (text) {
await session.sendRealtimeInput({ text });
}
if (activityStart) await session.sendRealtimeInput({ activityStart });
Copy link

@coderabbitai coderabbitai bot Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd -t f "realtime_api.ts"

Repository: livekit/agents-js

Length of output: 109


🏁 Script executed:

git ls-files plugins/google/src/beta/realtime/

Repository: livekit/agents-js

Length of output: 197


🏁 Script executed:

cat -n plugins/google/src/beta/realtime/realtime_api.ts | sed -n '930,950p'

Repository: livekit/agents-js

Length of output: 979


🏁 Script executed:

find . -name "biome.json" -o -name ".biomerc*" -o -name "biome.jsonc"

Repository: livekit/agents-js

Length of output: 73


🏁 Script executed:

cat biome.json

Repository: livekit/agents-js

Length of output: 1835


🏁 Script executed:

wc -l plugins/google/src/beta/realtime/realtime_api.ts

Repository: livekit/agents-js

Length of output: 114


🏁 Script executed:

head -20 plugins/google/src/beta/realtime/realtime_api.ts

Repository: livekit/agents-js

Length of output: 582


🏁 Script executed:

rg "noSwitchDeclarations" biome.json

Repository: livekit/agents-js

Length of output: 43


🏁 Script executed:

web_search
Biome noSwitchDeclarations rule recommended preset

Repository: livekit/agents-js

Length of output: 155


🌐 Web query:

Biome noSwitchDeclarations rule documentation recommended

💡 Result:

Biome’s noSwitchDeclarations rule documentation is here: lint/correctness/noSwitchDeclarations. It’s a recommended rule (enabled by default) and reports lexical declarations (let/const/class/function) directly inside switch case clauses; the suggested fix is to wrap the case body in a block (case X: { ... }). [1]

To configure it explicitly:

{
  "linter": {
    "rules": {
      "correctness": {
        "noSwitchDeclarations": "error"
      }
    }
  }
}
``` [1]

More general Biome linter configuration (enable/disable recommended rules, rule groups, etc.) is covered in the linter docs. [2]  

**Sources:** [1] [2]

---



</details>

**Wrap the `realtime_input` switch case in a block.**  
Biome's `noSwitchDeclarations` rule (enabled by default in the recommended preset) flags the `const` in a `case` without a block; wrapping avoids temporal dead zone hazards and resolves the lint error.

<details>
<summary>✅ Safe fix</summary>

```diff
-          case 'realtime_input':
-            const { mediaChunks, activityStart, activityEnd, text } = msg.value;
+          case 'realtime_input': {
+            const { mediaChunks, activityStart, activityEnd, text } = msg.value;
             if (mediaChunks) {
               for (const mediaChunk of mediaChunks) {
                 await session.sendRealtimeInput({ media: mediaChunk });
               }
             }
             if (text) {
               await session.sendRealtimeInput({ text });
             }
             if (activityStart) await session.sendRealtimeInput({ activityStart });
             if (activityEnd) await session.sendRealtimeInput({ activityEnd });
             break;
+          }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const { mediaChunks, activityStart, activityEnd, text } = msg.value;
if (mediaChunks) {
for (const mediaChunk of mediaChunks) {
await session.sendRealtimeInput({ media: mediaChunk });
}
}
if (text) {
await session.sendRealtimeInput({ text });
}
if (activityStart) await session.sendRealtimeInput({ activityStart });
case 'realtime_input': {
const { mediaChunks, activityStart, activityEnd, text } = msg.value;
if (mediaChunks) {
for (const mediaChunk of mediaChunks) {
await session.sendRealtimeInput({ media: mediaChunk });
}
}
if (text) {
await session.sendRealtimeInput({ text });
}
if (activityStart) await session.sendRealtimeInput({ activityStart });
if (activityEnd) await session.sendRealtimeInput({ activityEnd });
break;
}
🧰 Tools
🪛 Biome (2.1.2)

[error] 934-934: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🤖 Prompt for AI Agents
In `@plugins/google/src/beta/realtime/realtime_api.ts` around lines 934 - 943, The
switch case handling realtime_input declares const { mediaChunks, activityStart,
activityEnd, text } without a block which triggers noSwitchDeclarations; wrap
the case body in a block (add { ... } around the existing statements) so the
const lives inside a lexical block and keep the existing awaits
(session.sendRealtimeInput calls for mediaChunks, text, and activityStart)
unchanged; locate the case labeled realtime_input and enclose the current lines
starting with "const { mediaChunks, activityStart, activityEnd, text } =
msg.value;" through the activityStart await in a new { } block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not disagreeing, but for consistency inside the code the other switch cases would have to follow and I wanted to keep the changes to the minimum

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tillkolter, that's a fair point about consistency! Keeping the changes minimal in this PR makes sense, especially since fixing one case but not the others would create inconsistency. If you'd like to address the noSwitchDeclarations lint rule across all switch cases holistically in a follow-up, that would be a cleaner approach. For now, this works fine. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same goes for using await in await sendRealtimeInput(...). it doesnt have an effect since sendRealtimeInput is not a promise, but I'd rather change it for all methods separately in a follow up chore

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tillkolter, that makes perfect sense! You're right that removing the await from non-promise calls would be more effective as part of a broader consistency pass. Handling both the switch case blocks (for noSwitchDeclarations) and the unnecessary await keywords together in a follow-up chore will ensure uniform style across the entire switch statement.

Would you like me to create a GitHub issue to track this follow-up work? I can include both items:

  1. Add blocks to all switch cases to satisfy noSwitchDeclarations
  2. Remove unnecessary await keywords from sendRealtimeInput calls (and any other non-promise calls)

@tillkolter tillkolter marked this pull request as draft January 25, 2026 12:07
@tillkolter
Copy link
Contributor Author

found some issues with the state machine. will re-open once fixed.

@tillkolter tillkolter closed this Jan 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant