Skip to content

jasich/rails-jarvis-ux

Repository files navigation

Gemini Live API + Rails Demo

A demo Rails application showcasing real-time voice conversation with Google's Gemini 2.0 Flash model using the Gemini Live API. Features a JARVIS-inspired HUD interface with bidirectional audio streaming.

What This Demonstrates

  • Bidirectional WebSocket streaming between browser, Rails, and Gemini Live API
  • Real-time audio capture using AudioWorklet for 16kHz PCM encoding
  • ActionCable integration for streaming audio chunks to/from the server
  • Voice Activity Detection (VAD) handled server-side by Gemini

Architecture Overview

┌─────────────┐     ActionCable      ┌─────────────┐     WebSocket      ┌─────────────┐
│   Browser   │ ◄──────────────────► │    Rails    │ ◄────────────────► │   Gemini    │
│             │   PCM audio chunks   │             │   PCM audio chunks │   Live API  │
└─────────────┘                      └─────────────┘                    └─────────────┘

Frontend

  • Stimulus controller manages UI state (muted → listening → processing → speaking)
  • AudioWorklet captures microphone input, resamples to 16kHz PCM
  • Web Audio API plays back 24kHz PCM response chunks
  • Canvas visualizations for waveform and audio levels

Backend

  • VoiceChannel (ActionCable) handles bidirectional audio streaming
  • GeminiSession maintains persistent WebSocket to Gemini with callbacks
  • AudioConverter uses FFmpeg for format conversion (WebM ↔ PCM)
  • SyncWebsocketClient provides thread-safe WebSocket without background threads

Key Files

File Purpose
app/services/gemini_session.rb Persistent Gemini WebSocket connection
app/channels/voice_channel.rb ActionCable bidirectional streaming
app/javascript/controllers/voice_chat_controller.js Main UI controller
public/worklets/pcm_capture_processor.js Real-time PCM audio capture

Tech Stack

  • Rails 8.0 / Ruby 3.4
  • Hotwire (Turbo + Stimulus)
  • Tailwind CSS
  • SQLite
  • FFmpeg (for audio conversion)

Setup

Prerequisites

  • Ruby 3.4+
  • FFmpeg installed (brew install ffmpeg on macOS)
  • Gemini API key

Install & Run

# Install dependencies
bin/setup

# Set your API key (or add to Rails credentials)
export GEMINI_API_KEY=your_api_key_here

# Start the server
bin/dev

Visit http://localhost:3000, click the unmute button, and start talking.

Audio Format Details

Direction Format Sample Rate
Browser → Gemini 16-bit PCM mono 16kHz
Gemini → Browser 16-bit PCM mono 24kHz

License

MIT

About

A demo of using the Gemini Live API in a Rails app

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors