Implementing Hands-Free Multimodal Voice Mode with Wake Word & Interrupt Support in Gemini CLI #19830
priyanshuKumar56
started this conversation in
Ideas
Replies: 1 comment
-
|
From my point of view, wake word support is the kind of feature that looks attractive early but can easily distract from the harder product question, which is whether voice mode feels reliable during real coding sessions. I would put push to talk, interruption semantics, and synchronized text plus audio state ahead of wake word in the first version. If those are stable, the rendering choice for the waveform becomes a lot easier to evaluate as a UI detail rather than the core design risk. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @bdmorgan ,
I'm Priyanshu and I'm very interested in the Hands-Free Multimodal Voice Mode project.
I'm currently exploring an architecture involving wake-word activated audio capture,
VAD-gated PCM frame streaming, and bidirectional WebSocket integration with Gemini’s Live API.
For CLI-based waveform visualisation, would you recommend implementing a TUI using ink/blessed,
or is there an internal rendering approach already being considered?
Additionally, for interruption support, would sending a cancel_generation
control frame on VAD-triggered user speech align with the expected interaction model?
Looking forward to your guidance. @bdmorgan
Beta Was this translation helpful? Give feedback.
All reactions