|
1 | | -# Jemmie Backend |
| 1 | +<div align="center"> |
| 2 | + <h1>Jemmie Backend</h1> |
| 3 | + <p>Real-time voice agent powered by Gemini Live API</p> |
2 | 4 |
|
3 | | -A real-time voice agent powered by Gemini Live API. Kopibara team's submission to the Gemini Live Agent Challenge. |
| 5 | +<p> |
| 6 | + <a href="https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend/graphs/contributors"> |
| 7 | + <img src="https://img.shields.io/github/contributors/oadultradeepfield/gemini-live-agent-challenge-backend" alt="contributors" /> |
| 8 | + </a> |
| 9 | + <a href="https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend/commits/main"> |
| 10 | + <img src="https://img.shields.io/github/last-commit/oadultradeepfield/gemini-live-agent-challenge-backend" alt="last update" /> |
| 11 | + </a> |
| 12 | + <a href="https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend/stargazers"> |
| 13 | + <img src="https://img.shields.io/github/stars/oadultradeepfield/gemini-live-agent-challenge-backend" alt="stars" /> |
| 14 | + </a> |
| 15 | + <a href="https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend/blob/main/LICENSE"> |
| 16 | + <img src="https://img.shields.io/github/license/oadultradeepfield/gemini-live-agent-challenge-backend.svg" alt="license" /> |
| 17 | + </a> |
| 18 | +</p> |
| 19 | + |
| 20 | +<h4> |
| 21 | + <a href="#deployment">View Demo</a> |
| 22 | + <span> | </span> |
| 23 | + <a href="#getting-started">Documentation</a> |
| 24 | + <span> | </span> |
| 25 | + <a href="https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend/issues/">Report Bug</a> |
| 26 | +</h4> |
| 27 | +</div> |
| 28 | + |
| 29 | +<br /> |
| 30 | + |
| 31 | +# Table of Contents |
| 32 | + |
| 33 | +- [About the Project](#about-the-project) |
| 34 | + - [Architecture](#architecture) |
| 35 | + - [Tech Stack](#tech-stack) |
| 36 | + - [Features](#features) |
| 37 | + - [Environment Variables](#environment-variables) |
| 38 | +- [Getting Started](#getting-started) |
| 39 | + - [Prerequisites](#prerequisites) |
| 40 | + - [Installation](#installation) |
| 41 | + - [Running Tests](#running-tests) |
| 42 | +- [Deployment](#deployment) |
| 43 | +- [License](#license) |
| 44 | + |
| 45 | +## About the Project |
| 46 | + |
| 47 | +Jemmie is a real-time voice agent backend that delivers sub-second audio latency through the Gemini Live API. Built for |
| 48 | +the [Gemini Live Agent Challenge](https://geminiliveagentchallenge.devpost.com), it provides a WebSocket-based |
| 49 | +infrastructure for natural voice interaction with session persistence and action handling. |
| 50 | + |
| 51 | +The backend handles bidirectional audio streaming, visual context processing, and stateful session management with a |
| 52 | +layered architecture designed for extensibility and testability. |
| 53 | + |
| 54 | +### Architecture |
| 55 | + |
| 56 | +```mermaid |
| 57 | +flowchart TB |
| 58 | + subgraph Gateway["Gateway Layer"] |
| 59 | + WS[WebSocket Handler<br/>Connection Lifecycle] |
| 60 | + HB[Heartbeat Manager] |
| 61 | + end |
| 62 | +
|
| 63 | + subgraph FSM["State Machine Layer"] |
| 64 | + IDLE[Idle State] |
| 65 | + LISTEN[Listening State] |
| 66 | + THINK[Thinking State] |
| 67 | + SPEAK[Speaking State] |
| 68 | + end |
| 69 | +
|
| 70 | + subgraph Pipelines["Pipeline Layer"] |
| 71 | + AUDIO[Audio Pipeline<br/>16kHz Input / 24kHz Output] |
| 72 | + IMAGE[Image Pipeline<br/>JPEG Processing] |
| 73 | + end |
| 74 | +
|
| 75 | + subgraph Agent["Agent Layer"] |
| 76 | + ADK[ADK Integration<br/>Google Agent SDK] |
| 77 | + ROUTER[Action Router<br/>SET_TIMER / SHARE_LOCATION] |
| 78 | + end |
| 79 | +
|
| 80 | + subgraph Persistence["Persistence Layer"] |
| 81 | + SESSION[Session Manager] |
| 82 | + FIRESTORE[(Firestore)] |
| 83 | + end |
| 84 | +
|
| 85 | + WS --> IDLE |
| 86 | + IDLE --> LISTEN |
| 87 | + LISTEN --> THINK |
| 88 | + THINK --> SPEAK |
| 89 | + SPEAK --> IDLE |
| 90 | +
|
| 91 | + WS --> AUDIO |
| 92 | + WS --> IMAGE |
| 93 | + AUDIO --> ADK |
| 94 | + IMAGE --> ADK |
| 95 | +
|
| 96 | + ADK --> ROUTER |
| 97 | + SESSION <--> FIRESTORE |
| 98 | + FSM --> SESSION |
| 99 | +``` |
| 100 | + |
| 101 | +The backend is organized into five layers: |
| 102 | + |
| 103 | +- **Gateway Layer**: WebSocket connection handling with heartbeat management for connection health monitoring |
| 104 | +- **State Machine Layer**: Four-state FSM (Idle -> Listening -> Thinking -> Speaking) controlling conversation flow |
| 105 | +- **Pipeline Layer**: Audio transcoding between client format (16kHz) and Gemini format (24kHz), plus image processing |
| 106 | +- **Agent Layer**: ADK integration with Gemini Live API and action routing for client-side commands |
| 107 | +- **Persistence Layer**: Session state management with 10-minute resumption window via Firestore |
| 108 | + |
| 109 | +### Tech Stack |
| 110 | + |
| 111 | +<details> |
| 112 | +<summary>Server</summary> |
| 113 | +<ul> |
| 114 | + <li><a href="https://www.python.org/">Python 3.12</a></li> |
| 115 | + <li><a href="https://fastapi.tiangolo.com/">FastAPI</a></li> |
| 116 | + <li><a href="https://ai.google.dev/gemini-api/docs">Google GenAI SDK</a></li> |
| 117 | + <li><a href="https://cloud.google.com/run">Google Cloud Run</a></li> |
| 118 | + <li><a href="https://cloud.google.com/firestore">Firestore</a></li> |
| 119 | +</ul> |
| 120 | +</details> |
| 121 | + |
| 122 | +<details> |
| 123 | +<summary>DevOps</summary> |
| 124 | +<ul> |
| 125 | + <li><a href="https://www.docker.com/">Docker</a></li> |
| 126 | + <li><a href="https://docs.github.com/en/actions">GitHub Actions</a></li> |
| 127 | + <li><a href="https://cloud.google.com/build">Cloud Build</a></li> |
| 128 | +</ul> |
| 129 | +</details> |
| 130 | + |
| 131 | +### Features |
| 132 | + |
| 133 | +- **Bidirectional Audio Streaming**: PCM audio with automatic format conversion (16kHz input, 24kHz output) |
| 134 | +- **Stateful Session Management**: Device-as-identity pattern with 10-minute resumption window |
| 135 | +- **Action System**: Server-to-client commands (SET_TIMER) and client-to-server events (SHARE_LOCATION) |
| 136 | +- **Visual Context Support**: JPEG frame processing for multimodal understanding |
| 137 | +- **Graceful Degradation**: Connection health monitoring with automatic cleanup |
| 138 | + |
| 139 | +### Environment Variables |
| 140 | + |
| 141 | +To run this project, create a `.env` file based on `.env.example`: |
| 142 | + |
| 143 | +| Variable | Description | Required | |
| 144 | +|-----------------------------|---------------------------------------------------------------------|------------------------------| |
| 145 | +| `GOOGLE_API_KEY` | API key from [Google AI Studio](https://aistudio.google.com/apikey) | Yes (local dev) | |
| 146 | +| `GOOGLE_GENAI_USE_VERTEXAI` | Set to `FALSE` for API key, `TRUE` for Vertex AI | No (defaults to Vertex AI) | |
| 147 | +| `GOOGLE_CLOUD_PROJECT` | GCP project ID | Yes (Vertex AI mode) | |
| 148 | +| `GOOGLE_CLOUD_LOCATION` | GCP region (must be `us-central1` for Gemini Live) | No (defaults to us-central1) | |
| 149 | + |
| 150 | +## Getting Started |
| 151 | + |
| 152 | +### Prerequisites |
| 153 | + |
| 154 | +This project uses `uv` for package management: |
| 155 | + |
| 156 | +```bash |
| 157 | +curl -LsSf https://astral.sh/uv/install.sh | sh |
| 158 | +``` |
| 159 | + |
| 160 | +Docker is optional for containerized development. |
| 161 | + |
| 162 | +### Installation |
| 163 | + |
| 164 | +1. Clone the repository: |
| 165 | + |
| 166 | +```bash |
| 167 | +git clone https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend.git |
| 168 | +cd gemini-live-agent-challenge-backend |
| 169 | +``` |
| 170 | + |
| 171 | +2. Set up environment variables: |
| 172 | + |
| 173 | +```bash |
| 174 | +# For local development with API key |
| 175 | +export GOOGLE_API_KEY=your-api-key-here |
| 176 | +export GOOGLE_GENAI_USE_VERTEXAI=FALSE |
| 177 | +``` |
| 178 | + |
| 179 | +3. Run the development server: |
| 180 | + |
| 181 | +```bash |
| 182 | +./scripts/dev.sh |
| 183 | +``` |
| 184 | + |
| 185 | +Or with Docker Compose: |
| 186 | + |
| 187 | +```bash |
| 188 | +docker compose up --build |
| 189 | +``` |
| 190 | + |
| 191 | +The WebSocket endpoint will be available at `ws://localhost:8080/ws/{device_id}`. |
| 192 | + |
| 193 | +### Running Tests |
| 194 | + |
| 195 | +```bash |
| 196 | +# Run linting and type checks |
| 197 | +make check |
| 198 | + |
| 199 | +# Run test suite |
| 200 | +make test |
| 201 | + |
| 202 | +# Run integration tests (requires Firestore emulator) |
| 203 | +gcloud emulators firestore start --host-port=localhost:8081 & |
| 204 | +export FIRESTORE_EMULATOR_HOST=localhost:8081 |
| 205 | +make test |
| 206 | +``` |
| 207 | + |
| 208 | +## Deployment |
| 209 | + |
| 210 | +### Infrastructure Setup |
| 211 | + |
| 212 | +Run the setup script to create required GCP resources: |
| 213 | + |
| 214 | +```bash |
| 215 | +./scripts/setup_infra.sh your-project-id |
| 216 | +``` |
| 217 | + |
| 218 | +This creates: |
| 219 | + |
| 220 | +- Artifact Registry repository for Docker images |
| 221 | +- Firestore database for session storage |
| 222 | +- Service account with required permissions |
| 223 | + |
| 224 | +### GitHub Actions Deployment |
| 225 | + |
| 226 | +1. Add secrets to your GitHub repository: |
| 227 | + - `GCP_PROJECT_ID`: Your Google Cloud project ID |
| 228 | + - `GCP_SERVICE_ACCOUNT_KEY`: Service account JSON key |
| 229 | + |
| 230 | +2. Push to the main branch to trigger automatic deployment |
| 231 | + |
| 232 | +### GCP Deployment Proof |
| 233 | + |
| 234 | + |
| 235 | + |
| 236 | +## License |
| 237 | + |
| 238 | +Distributed under the MIT License. See `LICENSE` for more information. |
0 commit comments