Skip to content

Commit 6127b0d

Browse files
Add comprehensive README with architecture diagram for hackathon submission
1 parent 70687ce commit 6127b0d

File tree

2 files changed

+237
-2
lines changed

2 files changed

+237
-2
lines changed

README.md

Lines changed: 237 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,238 @@
1-
# Jemmie Backend
1+
<div align="center">
2+
<h1>Jemmie Backend</h1>
3+
<p>Real-time voice agent powered by Gemini Live API</p>
24

3-
A real-time voice agent powered by Gemini Live API. Kopibara team's submission to the Gemini Live Agent Challenge.
5+
<p>
6+
<a href="https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend/graphs/contributors">
7+
<img src="https://img.shields.io/github/contributors/oadultradeepfield/gemini-live-agent-challenge-backend" alt="contributors" />
8+
</a>
9+
<a href="https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend/commits/main">
10+
<img src="https://img.shields.io/github/last-commit/oadultradeepfield/gemini-live-agent-challenge-backend" alt="last update" />
11+
</a>
12+
<a href="https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend/stargazers">
13+
<img src="https://img.shields.io/github/stars/oadultradeepfield/gemini-live-agent-challenge-backend" alt="stars" />
14+
</a>
15+
<a href="https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend/blob/main/LICENSE">
16+
<img src="https://img.shields.io/github/license/oadultradeepfield/gemini-live-agent-challenge-backend.svg" alt="license" />
17+
</a>
18+
</p>
19+
20+
<h4>
21+
<a href="#deployment">View Demo</a>
22+
<span> | </span>
23+
<a href="#getting-started">Documentation</a>
24+
<span> | </span>
25+
<a href="https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend/issues/">Report Bug</a>
26+
</h4>
27+
</div>
28+
29+
<br />
30+
31+
# Table of Contents
32+
33+
- [About the Project](#about-the-project)
34+
- [Architecture](#architecture)
35+
- [Tech Stack](#tech-stack)
36+
- [Features](#features)
37+
- [Environment Variables](#environment-variables)
38+
- [Getting Started](#getting-started)
39+
- [Prerequisites](#prerequisites)
40+
- [Installation](#installation)
41+
- [Running Tests](#running-tests)
42+
- [Deployment](#deployment)
43+
- [License](#license)
44+
45+
## About the Project
46+
47+
Jemmie is a real-time voice agent backend that delivers sub-second audio latency through the Gemini Live API. Built for
48+
the [Gemini Live Agent Challenge](https://geminiliveagentchallenge.devpost.com), it provides a WebSocket-based
49+
infrastructure for natural voice interaction with session persistence and action handling.
50+
51+
The backend handles bidirectional audio streaming, visual context processing, and stateful session management with a
52+
layered architecture designed for extensibility and testability.
53+
54+
### Architecture
55+
56+
```mermaid
57+
flowchart TB
58+
subgraph Gateway["Gateway Layer"]
59+
WS[WebSocket Handler<br/>Connection Lifecycle]
60+
HB[Heartbeat Manager]
61+
end
62+
63+
subgraph FSM["State Machine Layer"]
64+
IDLE[Idle State]
65+
LISTEN[Listening State]
66+
THINK[Thinking State]
67+
SPEAK[Speaking State]
68+
end
69+
70+
subgraph Pipelines["Pipeline Layer"]
71+
AUDIO[Audio Pipeline<br/>16kHz Input / 24kHz Output]
72+
IMAGE[Image Pipeline<br/>JPEG Processing]
73+
end
74+
75+
subgraph Agent["Agent Layer"]
76+
ADK[ADK Integration<br/>Google Agent SDK]
77+
ROUTER[Action Router<br/>SET_TIMER / SHARE_LOCATION]
78+
end
79+
80+
subgraph Persistence["Persistence Layer"]
81+
SESSION[Session Manager]
82+
FIRESTORE[(Firestore)]
83+
end
84+
85+
WS --> IDLE
86+
IDLE --> LISTEN
87+
LISTEN --> THINK
88+
THINK --> SPEAK
89+
SPEAK --> IDLE
90+
91+
WS --> AUDIO
92+
WS --> IMAGE
93+
AUDIO --> ADK
94+
IMAGE --> ADK
95+
96+
ADK --> ROUTER
97+
SESSION <--> FIRESTORE
98+
FSM --> SESSION
99+
```
100+
101+
The backend is organized into five layers:
102+
103+
- **Gateway Layer**: WebSocket connection handling with heartbeat management for connection health monitoring
104+
- **State Machine Layer**: Four-state FSM (Idle -> Listening -> Thinking -> Speaking) controlling conversation flow
105+
- **Pipeline Layer**: Audio transcoding between client format (16kHz) and Gemini format (24kHz), plus image processing
106+
- **Agent Layer**: ADK integration with Gemini Live API and action routing for client-side commands
107+
- **Persistence Layer**: Session state management with 10-minute resumption window via Firestore
108+
109+
### Tech Stack
110+
111+
<details>
112+
<summary>Server</summary>
113+
<ul>
114+
<li><a href="https://www.python.org/">Python 3.12</a></li>
115+
<li><a href="https://fastapi.tiangolo.com/">FastAPI</a></li>
116+
<li><a href="https://ai.google.dev/gemini-api/docs">Google GenAI SDK</a></li>
117+
<li><a href="https://cloud.google.com/run">Google Cloud Run</a></li>
118+
<li><a href="https://cloud.google.com/firestore">Firestore</a></li>
119+
</ul>
120+
</details>
121+
122+
<details>
123+
<summary>DevOps</summary>
124+
<ul>
125+
<li><a href="https://www.docker.com/">Docker</a></li>
126+
<li><a href="https://docs.github.com/en/actions">GitHub Actions</a></li>
127+
<li><a href="https://cloud.google.com/build">Cloud Build</a></li>
128+
</ul>
129+
</details>
130+
131+
### Features
132+
133+
- **Bidirectional Audio Streaming**: PCM audio with automatic format conversion (16kHz input, 24kHz output)
134+
- **Stateful Session Management**: Device-as-identity pattern with 10-minute resumption window
135+
- **Action System**: Server-to-client commands (SET_TIMER) and client-to-server events (SHARE_LOCATION)
136+
- **Visual Context Support**: JPEG frame processing for multimodal understanding
137+
- **Graceful Degradation**: Connection health monitoring with automatic cleanup
138+
139+
### Environment Variables
140+
141+
To run this project, create a `.env` file based on `.env.example`:
142+
143+
| Variable | Description | Required |
144+
|-----------------------------|---------------------------------------------------------------------|------------------------------|
145+
| `GOOGLE_API_KEY` | API key from [Google AI Studio](https://aistudio.google.com/apikey) | Yes (local dev) |
146+
| `GOOGLE_GENAI_USE_VERTEXAI` | Set to `FALSE` for API key, `TRUE` for Vertex AI | No (defaults to Vertex AI) |
147+
| `GOOGLE_CLOUD_PROJECT` | GCP project ID | Yes (Vertex AI mode) |
148+
| `GOOGLE_CLOUD_LOCATION` | GCP region (must be `us-central1` for Gemini Live) | No (defaults to us-central1) |
149+
150+
## Getting Started
151+
152+
### Prerequisites
153+
154+
This project uses `uv` for package management:
155+
156+
```bash
157+
curl -LsSf https://astral.sh/uv/install.sh | sh
158+
```
159+
160+
Docker is optional for containerized development.
161+
162+
### Installation
163+
164+
1. Clone the repository:
165+
166+
```bash
167+
git clone https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend.git
168+
cd gemini-live-agent-challenge-backend
169+
```
170+
171+
2. Set up environment variables:
172+
173+
```bash
174+
# For local development with API key
175+
export GOOGLE_API_KEY=your-api-key-here
176+
export GOOGLE_GENAI_USE_VERTEXAI=FALSE
177+
```
178+
179+
3. Run the development server:
180+
181+
```bash
182+
./scripts/dev.sh
183+
```
184+
185+
Or with Docker Compose:
186+
187+
```bash
188+
docker compose up --build
189+
```
190+
191+
The WebSocket endpoint will be available at `ws://localhost:8080/ws/{device_id}`.
192+
193+
### Running Tests
194+
195+
```bash
196+
# Run linting and type checks
197+
make check
198+
199+
# Run test suite
200+
make test
201+
202+
# Run integration tests (requires Firestore emulator)
203+
gcloud emulators firestore start --host-port=localhost:8081 &
204+
export FIRESTORE_EMULATOR_HOST=localhost:8081
205+
make test
206+
```
207+
208+
## Deployment
209+
210+
### Infrastructure Setup
211+
212+
Run the setup script to create required GCP resources:
213+
214+
```bash
215+
./scripts/setup_infra.sh your-project-id
216+
```
217+
218+
This creates:
219+
220+
- Artifact Registry repository for Docker images
221+
- Firestore database for session storage
222+
- Service account with required permissions
223+
224+
### GitHub Actions Deployment
225+
226+
1. Add secrets to your GitHub repository:
227+
- `GCP_PROJECT_ID`: Your Google Cloud project ID
228+
- `GCP_SERVICE_ACCOUNT_KEY`: Service account JSON key
229+
230+
2. Push to the main branch to trigger automatic deployment
231+
232+
### GCP Deployment Proof
233+
234+
![Proof of deployment](docs/proof_of_deployment.jpeg)
235+
236+
## License
237+
238+
Distributed under the MIT License. See `LICENSE` for more information.

docs/proof_of_deployment.jpeg

653 KB
Loading

0 commit comments

Comments
 (0)