oadultradeepfield
diff --git a/‎README.md‎
Lines changed: 237 additions & 2 deletions b/‎README.md‎
Lines changed: 237 additions & 2 deletions
diff --git a/‎docs/proof_of_deployment.jpeg‎
653 KB b/‎docs/proof_of_deployment.jpeg‎
653 KB
@@ -1,3 +1,238 @@
-# Jemmie Backend
+<div align="center">
+  <h1>Jemmie Backend</h1>
+  <p>Real-time voice agent powered by Gemini Live API</p>
 
-A real-time voice agent powered by Gemini Live API. Kopibara team's submission to the Gemini Live Agent Challenge.
+<p>
+  <a href="https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend/graphs/contributors">
+    <img src="https://img.shields.io/github/contributors/oadultradeepfield/gemini-live-agent-challenge-backend" alt="contributors" />
+  </a>
+  <a href="https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend/commits/main">
+    <img src="https://img.shields.io/github/last-commit/oadultradeepfield/gemini-live-agent-challenge-backend" alt="last update" />
+  </a>
+  <a href="https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend/stargazers">
+    <img src="https://img.shields.io/github/stars/oadultradeepfield/gemini-live-agent-challenge-backend" alt="stars" />
+  </a>
+  <a href="https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend/blob/main/LICENSE">
+    <img src="https://img.shields.io/github/license/oadultradeepfield/gemini-live-agent-challenge-backend.svg" alt="license" />
+  </a>
+</p>
+
+<h4>
+  <a href="#deployment">View Demo</a>
+  <span> | </span>
+  <a href="#getting-started">Documentation</a>
+  <span> | </span>
+  <a href="https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend/issues/">Report Bug</a>
+</h4>
+</div>
+
+<br />
+
+# Table of Contents
+
+- [About the Project](#about-the-project)
+    - [Architecture](#architecture)
+    - [Tech Stack](#tech-stack)
+    - [Features](#features)
+    - [Environment Variables](#environment-variables)
+- [Getting Started](#getting-started)
+    - [Prerequisites](#prerequisites)
+    - [Installation](#installation)
+    - [Running Tests](#running-tests)
+- [Deployment](#deployment)
+- [License](#license)
+
+## About the Project
+
+Jemmie is a real-time voice agent backend that delivers sub-second audio latency through the Gemini Live API. Built for
+the [Gemini Live Agent Challenge](https://geminiliveagentchallenge.devpost.com), it provides a WebSocket-based
+infrastructure for natural voice interaction with session persistence and action handling.
+
+The backend handles bidirectional audio streaming, visual context processing, and stateful session management with a
+layered architecture designed for extensibility and testability.
+
+### Architecture
+
+```mermaid
+flowchart TB
+    subgraph Gateway["Gateway Layer"]
+        WS[WebSocket Handler<br/>Connection Lifecycle]
+        HB[Heartbeat Manager]
+    end
+
+    subgraph FSM["State Machine Layer"]
+        IDLE[Idle State]
+        LISTEN[Listening State]
+        THINK[Thinking State]
+        SPEAK[Speaking State]
+    end
+
+    subgraph Pipelines["Pipeline Layer"]
+        AUDIO[Audio Pipeline<br/>16kHz Input / 24kHz Output]
+        IMAGE[Image Pipeline<br/>JPEG Processing]
+    end
+
+    subgraph Agent["Agent Layer"]
+        ADK[ADK Integration<br/>Google Agent SDK]
+        ROUTER[Action Router<br/>SET_TIMER / SHARE_LOCATION]
+    end
+
+    subgraph Persistence["Persistence Layer"]
+        SESSION[Session Manager]
+        FIRESTORE[(Firestore)]
+    end
+
+    WS --> IDLE
+    IDLE --> LISTEN
+    LISTEN --> THINK
+    THINK --> SPEAK
+    SPEAK --> IDLE
+
+    WS --> AUDIO
+    WS --> IMAGE
+    AUDIO --> ADK
+    IMAGE --> ADK
+
+    ADK --> ROUTER
+    SESSION <--> FIRESTORE
+    FSM --> SESSION
+```
+
+The backend is organized into five layers:
+
+- **Gateway Layer**: WebSocket connection handling with heartbeat management for connection health monitoring
+- **State Machine Layer**: Four-state FSM (Idle -> Listening -> Thinking -> Speaking) controlling conversation flow
+- **Pipeline Layer**: Audio transcoding between client format (16kHz) and Gemini format (24kHz), plus image processing
+- **Agent Layer**: ADK integration with Gemini Live API and action routing for client-side commands
+- **Persistence Layer**: Session state management with 10-minute resumption window via Firestore
+
+### Tech Stack
+
+<details>
+<summary>Server</summary>
+<ul>
+  <li><a href="https://www.python.org/">Python 3.12</a></li>
+  <li><a href="https://fastapi.tiangolo.com/">FastAPI</a></li>
+  <li><a href="https://ai.google.dev/gemini-api/docs">Google GenAI SDK</a></li>
+  <li><a href="https://cloud.google.com/run">Google Cloud Run</a></li>
+  <li><a href="https://cloud.google.com/firestore">Firestore</a></li>
+</ul>
+</details>
+
+<details>
+<summary>DevOps</summary>
+<ul>
+  <li><a href="https://www.docker.com/">Docker</a></li>
+  <li><a href="https://docs.github.com/en/actions">GitHub Actions</a></li>
+  <li><a href="https://cloud.google.com/build">Cloud Build</a></li>
+</ul>
+</details>
+
+### Features
+
+- **Bidirectional Audio Streaming**: PCM audio with automatic format conversion (16kHz input, 24kHz output)
+- **Stateful Session Management**: Device-as-identity pattern with 10-minute resumption window
+- **Action System**: Server-to-client commands (SET_TIMER) and client-to-server events (SHARE_LOCATION)
+- **Visual Context Support**: JPEG frame processing for multimodal understanding
+- **Graceful Degradation**: Connection health monitoring with automatic cleanup
+
+### Environment Variables
+
+To run this project, create a `.env` file based on `.env.example`:
+
+| Variable                    | Description                                                         | Required                     |
+|-----------------------------|---------------------------------------------------------------------|------------------------------|
+| `GOOGLE_API_KEY`            | API key from [Google AI Studio](https://aistudio.google.com/apikey) | Yes (local dev)              |
+| `GOOGLE_GENAI_USE_VERTEXAI` | Set to `FALSE` for API key, `TRUE` for Vertex AI                    | No (defaults to Vertex AI)   |
+| `GOOGLE_CLOUD_PROJECT`      | GCP project ID                                                      | Yes (Vertex AI mode)         |
+| `GOOGLE_CLOUD_LOCATION`     | GCP region (must be `us-central1` for Gemini Live)                  | No (defaults to us-central1) |
+
+## Getting Started
+
+### Prerequisites
+
+This project uses `uv` for package management:
+
+```bash
+curl -LsSf https://astral.sh/uv/install.sh | sh
+```
+
+Docker is optional for containerized development.
+
+### Installation
+
+1. Clone the repository:
+
+```bash
+git clone https://github.com/oadultradeepfield/gemini-live-agent-challenge-backend.git
+cd gemini-live-agent-challenge-backend
+```
+
+2. Set up environment variables:
+
+```bash
+# For local development with API key
+export GOOGLE_API_KEY=your-api-key-here
+export GOOGLE_GENAI_USE_VERTEXAI=FALSE
+```
+
+3. Run the development server:
+
+```bash
+./scripts/dev.sh
+```
+
+Or with Docker Compose:
+
+```bash
+docker compose up --build
+```
+
+The WebSocket endpoint will be available at `ws://localhost:8080/ws/{device_id}`.
+
+### Running Tests
+
+```bash
+# Run linting and type checks
+make check
+
+# Run test suite
+make test
+
+# Run integration tests (requires Firestore emulator)
+gcloud emulators firestore start --host-port=localhost:8081 &
+export FIRESTORE_EMULATOR_HOST=localhost:8081
+make test
+```
+
+## Deployment
+
+### Infrastructure Setup
+
+Run the setup script to create required GCP resources:
+
+```bash
+./scripts/setup_infra.sh your-project-id
+```
+
+This creates:
+
+- Artifact Registry repository for Docker images
+- Firestore database for session storage
+- Service account with required permissions
+
+### GitHub Actions Deployment
+
+1. Add secrets to your GitHub repository:
+    - `GCP_PROJECT_ID`: Your Google Cloud project ID
+    - `GCP_SERVICE_ACCOUNT_KEY`: Service account JSON key
+
+2. Push to the main branch to trigger automatic deployment
+
+### GCP Deployment Proof
+
+![Proof of deployment](docs/proof_of_deployment.jpeg)
+
+## License
+
+Distributed under the MIT License. See `LICENSE` for more information.