A demo Rails application showcasing real-time voice conversation with Google's Gemini 2.0 Flash model using the Gemini Live API. Features a JARVIS-inspired HUD interface with bidirectional audio streaming.
- Bidirectional WebSocket streaming between browser, Rails, and Gemini Live API
- Real-time audio capture using AudioWorklet for 16kHz PCM encoding
- ActionCable integration for streaming audio chunks to/from the server
- Voice Activity Detection (VAD) handled server-side by Gemini
┌─────────────┐ ActionCable ┌─────────────┐ WebSocket ┌─────────────┐
│ Browser │ ◄──────────────────► │ Rails │ ◄────────────────► │ Gemini │
│ │ PCM audio chunks │ │ PCM audio chunks │ Live API │
└─────────────┘ └─────────────┘ └─────────────┘
- Stimulus controller manages UI state (muted → listening → processing → speaking)
- AudioWorklet captures microphone input, resamples to 16kHz PCM
- Web Audio API plays back 24kHz PCM response chunks
- Canvas visualizations for waveform and audio levels
- VoiceChannel (ActionCable) handles bidirectional audio streaming
- GeminiSession maintains persistent WebSocket to Gemini with callbacks
- AudioConverter uses FFmpeg for format conversion (WebM ↔ PCM)
- SyncWebsocketClient provides thread-safe WebSocket without background threads
| File | Purpose |
|---|---|
app/services/gemini_session.rb |
Persistent Gemini WebSocket connection |
app/channels/voice_channel.rb |
ActionCable bidirectional streaming |
app/javascript/controllers/voice_chat_controller.js |
Main UI controller |
public/worklets/pcm_capture_processor.js |
Real-time PCM audio capture |
- Rails 8.0 / Ruby 3.4
- Hotwire (Turbo + Stimulus)
- Tailwind CSS
- SQLite
- FFmpeg (for audio conversion)
- Ruby 3.4+
- FFmpeg installed (
brew install ffmpegon macOS) - Gemini API key
# Install dependencies
bin/setup
# Set your API key (or add to Rails credentials)
export GEMINI_API_KEY=your_api_key_here
# Start the server
bin/devVisit http://localhost:3000, click the unmute button, and start talking.
| Direction | Format | Sample Rate |
|---|---|---|
| Browser → Gemini | 16-bit PCM mono | 16kHz |
| Gemini → Browser | 16-bit PCM mono | 24kHz |
MIT