-
Notifications
You must be signed in to change notification settings - Fork 214
feat: Support Gemini realtime text interruptions #998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support Gemini realtime text interruptions #998
Conversation
|
📝 WalkthroughWalkthroughAdds interrupt-driven text input handling to RealtimeSession: introduces Changes
Sequence DiagramsequenceDiagram
participant User
participant RealtimeSession
participant Generation
participant Server
User->>RealtimeSession: Trigger interrupt
activate RealtimeSession
RealtimeSession->>RealtimeSession: Set pendingInterruptText\nSet discardServerOutput = true
RealtimeSession->>Generation: Mark current generation done
deactivate RealtimeSession
Generation-->>RealtimeSession: Acknowledge done
RealtimeSession->>RealtimeSession: Emit realtime_input with accumulated text
Server->>RealtimeSession: Incoming realtime_input (optional text / media)
activate RealtimeSession
RealtimeSession->>RealtimeSession: Process realtime_input (accept text)
deactivate RealtimeSession
Server->>RealtimeSession: Server content (modelTurn / outputTranscription)
activate RealtimeSession
alt discardServerOutput is true
RealtimeSession->>RealtimeSession: Skip processing/appending server content
else
RealtimeSession->>RealtimeSession: Process server content normally
end
deactivate RealtimeSession
Server->>RealtimeSession: turnComplete / generationComplete
RealtimeSession->>RealtimeSession: Reset discardServerOutput
RealtimeSession->>Generation: Check isNewGeneration
alt discardServerOutput is true
Generation-->>RealtimeSession: Prevent new generation
else
Generation-->>RealtimeSession: Allow generation lifecycle
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (3)**/*.{ts,tsx,js,jsx}📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Files:
**/*.{ts,tsx}?(test|example|spec)📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Files:
**/*.{ts,tsx}?(test|example)📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Files:
🧬 Code graph analysis (1)plugins/google/src/beta/realtime/realtime_api.ts (1)
🪛 Biome (2.1.2)plugins/google/src/beta/realtime/realtime_api.ts[error] 934-934: Other switch clauses can erroneously access this declaration. The declaration is defined in this switch clause: Safe fix: Wrap the declaration in a block. (lint/correctness/noSwitchDeclarations) 🔇 Additional comments (6)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@plugins/google/src/beta/realtime/realtime_api.ts`:
- Around line 573-592: The code clears this.pendingInterruptText unconditionally
even when no user text is found; change the logic in the block that iterates
over turns (types.Content) so you only clear this.pendingInterruptText after you
have actually sent at least one realtime_input via this.sendClientEvent.
Concretely, add a local boolean (e.g., sentRealtime) before the loop, set it to
true whenever you call this.sendClientEvent({ type: 'realtime_input', ... }),
and after the loop set this.pendingInterruptText = false only if sentRealtime is
true (leave it true otherwise) so future user text can still trigger
realtime_input.
- Around line 934-943: The switch case handling realtime_input declares const {
mediaChunks, activityStart, activityEnd, text } without a block which triggers
noSwitchDeclarations; wrap the case body in a block (add { ... } around the
existing statements) so the const lives inside a lexical block and keep the
existing awaits (session.sendRealtimeInput calls for mediaChunks, text, and
activityStart) unchanged; locate the case labeled realtime_input and enclose the
current lines starting with "const { mediaChunks, activityStart, activityEnd,
text } = msg.value;" through the activityStart await in a new { } block.
- Around line 1576-1578: The discardServerOutput flag can cause new-generation
content to be dropped if protocol ordering allows content for the next
generation to arrive before generationComplete/turnComplete; update the logic to
be robust by either (1) adding a clear explanatory comment near
discardServerOutput and the isNewGeneration() call documenting the protocol
ordering guarantee expected, or (2) proactively clearing discardServerOutput
when serverContent.interrupted is observed (in the same handler that checks
serverContent), or (3) modifying isNewGeneration() to treat a differing
generationId as a new generation even if discardServerOutput is true so
new-generation messages are accepted; locate and change behavior around
isNewGeneration(), discardServerOutput, and the serverContent handling to
implement one of these options.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/google/src/beta/realtime/realtime_api.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'
Files:
plugins/google/src/beta/realtime/realtime_api.ts
**/*.{ts,tsx}?(test|example|spec)
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
When testing inference LLM, always use full model names from
agents/src/inference/models.ts(e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')
Files:
plugins/google/src/beta/realtime/realtime_api.ts
**/*.{ts,tsx}?(test|example)
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Initialize logger before using any LLM functionality with
initializeLogger({ pretty: true })from '@livekit/agents'
Files:
plugins/google/src/beta/realtime/realtime_api.ts
🧬 Code graph analysis (1)
plugins/google/src/beta/realtime/realtime_api.ts (2)
plugins/openai/src/realtime/api_proto.ts (1)
Content(181-186)plugins/openai/src/realtime/realtime_model.ts (1)
session(305-307)
🪛 Biome (2.1.2)
plugins/google/src/beta/realtime/realtime_api.ts
[error] 934-934: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.
The declaration is defined in this switch clause:
Safe fix: Wrap the declaration in a block.
(lint/correctness/noSwitchDeclarations)
🔇 Additional comments (3)
plugins/google/src/beta/realtime/realtime_api.ts (3)
413-414: Clean state flags for interrupt handling.
The added flags make the interrupt/discard state explicit and easy to reason about.
748-752: Interrupt flow sequencing looks good.
Marking the current generation done before starting activity keeps preemption consistent.
1329-1420: Output-discard gating is solid.
Suppressing modelTurn/outputTranscription while discarding avoids stale playout and text leakage.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| const { mediaChunks, activityStart, activityEnd, text } = msg.value; | ||
| if (mediaChunks) { | ||
| for (const mediaChunk of mediaChunks) { | ||
| await session.sendRealtimeInput({ media: mediaChunk }); | ||
| } | ||
| } | ||
| if (text) { | ||
| await session.sendRealtimeInput({ text }); | ||
| } | ||
| if (activityStart) await session.sendRealtimeInput({ activityStart }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
fd -t f "realtime_api.ts"Repository: livekit/agents-js
Length of output: 109
🏁 Script executed:
git ls-files plugins/google/src/beta/realtime/Repository: livekit/agents-js
Length of output: 197
🏁 Script executed:
cat -n plugins/google/src/beta/realtime/realtime_api.ts | sed -n '930,950p'Repository: livekit/agents-js
Length of output: 979
🏁 Script executed:
find . -name "biome.json" -o -name ".biomerc*" -o -name "biome.jsonc"Repository: livekit/agents-js
Length of output: 73
🏁 Script executed:
cat biome.jsonRepository: livekit/agents-js
Length of output: 1835
🏁 Script executed:
wc -l plugins/google/src/beta/realtime/realtime_api.tsRepository: livekit/agents-js
Length of output: 114
🏁 Script executed:
head -20 plugins/google/src/beta/realtime/realtime_api.tsRepository: livekit/agents-js
Length of output: 582
🏁 Script executed:
rg "noSwitchDeclarations" biome.jsonRepository: livekit/agents-js
Length of output: 43
🏁 Script executed:
web_search
Biome noSwitchDeclarations rule recommended presetRepository: livekit/agents-js
Length of output: 155
🌐 Web query:
Biome noSwitchDeclarations rule documentation recommended
💡 Result:
Biome’s noSwitchDeclarations rule documentation is here: lint/correctness/noSwitchDeclarations. It’s a recommended rule (enabled by default) and reports lexical declarations (let/const/class/function) directly inside switch case clauses; the suggested fix is to wrap the case body in a block (case X: { ... }). [1]
To configure it explicitly:
{
"linter": {
"rules": {
"correctness": {
"noSwitchDeclarations": "error"
}
}
}
}
``` [1]
More general Biome linter configuration (enable/disable recommended rules, rule groups, etc.) is covered in the linter docs. [2]
**Sources:** [1] [2]
---
</details>
**Wrap the `realtime_input` switch case in a block.**
Biome's `noSwitchDeclarations` rule (enabled by default in the recommended preset) flags the `const` in a `case` without a block; wrapping avoids temporal dead zone hazards and resolves the lint error.
<details>
<summary>✅ Safe fix</summary>
```diff
- case 'realtime_input':
- const { mediaChunks, activityStart, activityEnd, text } = msg.value;
+ case 'realtime_input': {
+ const { mediaChunks, activityStart, activityEnd, text } = msg.value;
if (mediaChunks) {
for (const mediaChunk of mediaChunks) {
await session.sendRealtimeInput({ media: mediaChunk });
}
}
if (text) {
await session.sendRealtimeInput({ text });
}
if (activityStart) await session.sendRealtimeInput({ activityStart });
if (activityEnd) await session.sendRealtimeInput({ activityEnd });
break;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const { mediaChunks, activityStart, activityEnd, text } = msg.value; | |
| if (mediaChunks) { | |
| for (const mediaChunk of mediaChunks) { | |
| await session.sendRealtimeInput({ media: mediaChunk }); | |
| } | |
| } | |
| if (text) { | |
| await session.sendRealtimeInput({ text }); | |
| } | |
| if (activityStart) await session.sendRealtimeInput({ activityStart }); | |
| case 'realtime_input': { | |
| const { mediaChunks, activityStart, activityEnd, text } = msg.value; | |
| if (mediaChunks) { | |
| for (const mediaChunk of mediaChunks) { | |
| await session.sendRealtimeInput({ media: mediaChunk }); | |
| } | |
| } | |
| if (text) { | |
| await session.sendRealtimeInput({ text }); | |
| } | |
| if (activityStart) await session.sendRealtimeInput({ activityStart }); | |
| if (activityEnd) await session.sendRealtimeInput({ activityEnd }); | |
| break; | |
| } |
🧰 Tools
🪛 Biome (2.1.2)
[error] 934-934: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.
The declaration is defined in this switch clause:
Safe fix: Wrap the declaration in a block.
(lint/correctness/noSwitchDeclarations)
🤖 Prompt for AI Agents
In `@plugins/google/src/beta/realtime/realtime_api.ts` around lines 934 - 943, The
switch case handling realtime_input declares const { mediaChunks, activityStart,
activityEnd, text } without a block which triggers noSwitchDeclarations; wrap
the case body in a block (add { ... } around the existing statements) so the
const lives inside a lexical block and keep the existing awaits
(session.sendRealtimeInput calls for mediaChunks, text, and activityStart)
unchanged; locate the case labeled realtime_input and enclose the current lines
starting with "const { mediaChunks, activityStart, activityEnd, text } =
msg.value;" through the activityStart await in a new { } block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not disagreeing, but for consistency inside the code the other switch cases would have to follow and I wanted to keep the changes to the minimum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tillkolter, that's a fair point about consistency! Keeping the changes minimal in this PR makes sense, especially since fixing one case but not the others would create inconsistency. If you'd like to address the noSwitchDeclarations lint rule across all switch cases holistically in a follow-up, that would be a cleaner approach. For now, this works fine. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same goes for using await in await sendRealtimeInput(...). it doesnt have an effect since sendRealtimeInput is not a promise, but I'd rather change it for all methods separately in a follow up chore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tillkolter, that makes perfect sense! You're right that removing the await from non-promise calls would be more effective as part of a broader consistency pass. Handling both the switch case blocks (for noSwitchDeclarations) and the unnecessary await keywords together in a follow-up chore will ensure uniform style across the entire switch statement.
Would you like me to create a GitHub issue to track this follow-up work? I can include both items:
- Add blocks to all switch cases to satisfy
noSwitchDeclarations - Remove unnecessary
awaitkeywords fromsendRealtimeInputcalls (and any other non-promise calls)
|
found some issues with the state machine. will re-open once fixed. |
Description
This PR refines Gemini Realtime interrupt handling by sending
realtime_inputtext when an interrupt is active, while always keeping full user turns incontent.The goal is to preserve context fidelity while still triggering low‑latency interruption behavior.
In contrast to OpenAI's well defined message/event schema to support interruption, truncation and request cancelation, Gemini is falls short of providing an explicit interface to manage the state of the request. This solution tries to simulate the internal behavior of the OpenAI plugin to achieve similar behaviour for text interruptions.
Changes Made
realtime_inputtext on an explicit interrupt flag instead of generation state, so interrupts remain responsive even after_doneis set.content(no stripping), so the model’s context stays consistent.Pre-Review Checklist
Additional Notes
This keeps Gemini’s interruption responsiveness while avoiding the context drift caused by stripping user turns from
content.Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.