Update documentation with architecture diagram and testing instructions

oadultradeepfield · oadultradeepfield · commit 823b21b77b62 · 2026-03-15T10:26:21.000+08:00
Add presentation-ready Mermaid diagram with Google-themed styling.
Include testing instructions for judges and web search setup guide.
Update CI workflow to pass search API secrets to Cloud Run.
diff --git a/.env.example b/.env.example
@@ -8,5 +8,11 @@ GOOGLE_GENAI_USE_VERTEXAI=FALSE
 # GOOGLE_CLOUD_PROJECT=your-project-id
 # GOOGLE_CLOUD_LOCATION=us-central1
 
+# Web Search (Google Custom Search API) - Optional
+# Enable at: https://console.cloud.google.com/apis/library/customsearch.googleapis.com
+# Create engine at: https://programmablesearchengine.google.com/
+# GOOGLE_SEARCH_API_KEY=your-search-api-key
+# GOOGLE_SEARCH_ENGINE_ID=your-search-engine-id
+
 # Optional
 # LOG_LEVEL=DEBUG
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -81,7 +81,7 @@ jobs:
             --min-instances=1 \
             --port=8080 \
             --allow-unauthenticated \
-            --set-env-vars="GOOGLE_GENAI_USE_VERTEXAI=TRUE,GOOGLE_CLOUD_PROJECT=${{ secrets.GCP_PROJECT_ID }},GOOGLE_CLOUD_LOCATION=${{ env.REGION }}"
+            --set-env-vars="GOOGLE_GENAI_USE_VERTEXAI=TRUE,GOOGLE_CLOUD_PROJECT=${{ secrets.GCP_PROJECT_ID }},GOOGLE_CLOUD_LOCATION=${{ env.REGION }},GOOGLE_SEARCH_API_KEY=${{ secrets.GOOGLE_SEARCH_API_KEY }},GOOGLE_SEARCH_ENGINE_ID=${{ secrets.GOOGLE_SEARCH_ENGINE_ID }}"
 
       - name: Get Service URL
         run: |
diff --git a/README.md b/README.md
@@ -43,51 +43,100 @@ the agent layer interfaces with Gemini Live API, and the persistence layer maint
 ## Architecture
 
 ```mermaid
+%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#4285f4', 'primaryTextColor': '#fff', 'primaryBorderColor': '#1a73e8', 'lineColor': '#5f6368', 'secondaryColor': '#34a853', 'tertiaryColor': '#fbbc04'}}}%%
 flowchart TB
-    subgraph Client["Client"]
-        APP[Mobile/Web App]
+    subgraph CLIENT["📱 Client Layer"]
+        direction LR
+        APP["Mobile App<br/><sub>iOS / Android</sub>"]
     end
 
-    subgraph Gateway["Gateway Layer"]
-        WS[ConnectionGateway<br/>WebSocket Lifecycle]
+    subgraph GCP["☁️ Google Cloud Platform"]
+        direction TB
+
+        subgraph GATEWAY["🔗 Gateway Layer"]
+            WS["ConnectionGateway<br/><sub>WebSocket Lifecycle</sub>"]
+        end
+
+        subgraph ENGINE["⚙️ Engine Layer"]
+            FE["FrameEngine<br/><sub>State Machine</sub>"]
+            CC["ConnectionContext<br/><sub>Shared State</sub>"]
+        end
+
+        subgraph PIPELINE["🔄 Pipeline Layer"]
+            direction LR
+            AUDIO["🔊 Audio<br/><sub>16kHz → 24kHz</sub>"]
+            IMAGE["🖼️ Image<br/><sub>JPEG Processing</sub>"]
+            ACTION["⚡ Action<br/><sub>Event Dispatch</sub>"]
+        end
+
+        subgraph AGENT["🤖 Agent Layer"]
+            LS["LiveSession<br/><sub>Streaming Manager</sub>"]
+            ROUTER["Tool Router<br/><sub>9 Built-in Tools</sub>"]
+            SEARCH["Web Search<br/><sub>Grounding</sub>"]
+        end
+
+        subgraph PERSIST["💾 Persistence"]
+            SM["Session Manager"]
+            FIRESTORE[("Firestore<br/><sub>Session State</sub>")]
+        end
     end
 
-    subgraph Engine["Engine Layer"]
-        FE[FrameEngine<br/>State Machine]
-        CC[ConnectionContext<br/>Shared State]
+    subgraph GEMINI["🌟 Google AI"]
+        GEM["Gemini Live API<br/><sub>gemini-live-2.5-flash</sub>"]
     end
 
-    subgraph Pipelines["Pipeline Layer"]
-        AUDIO[Audio Pipeline<br/>16kHz In / 24kHz Out]
-        IMAGE[Image Pipeline<br/>JPEG Processing]
-        ACTION[Action Pipeline<br/>Event Dispatch]
+    subgraph EXTERNAL["🌐 External Services"]
+        CSE["Custom Search API<br/><sub>Web Grounding</sub>"]
     end
 
-    subgraph Agent["Agent Layer"]
-        LS[LiveSession<br/>Gemini Live API]
-        ROUTER[Action Router<br/>SET_TIMER / SHARE_LOCATION]
-    end
-
-    subgraph Persistence["Persistence Layer"]
-        SM[Session Manager]
-        FIRESTORE[(Firestore)]
-    end
-
-    APP <-->|"ws://host/ws/{device_id}"| WS
+    APP <-->|"wss:// Secure WebSocket"| WS
     WS --> FE
     FE --> CC
-    CC --> AUDIO
-    CC --> IMAGE
-    CC --> ACTION
-    AUDIO --> LS
-    IMAGE --> LS
+    CC --> AUDIO & IMAGE & ACTION
+    AUDIO & IMAGE --> LS
     ACTION --> ROUTER
-    LS <-->|"Gemini Live API"| GEMINI[(Gemini)]
+    ROUTER --> SEARCH
+    LS <-->|"Real-time Audio<br/>Function Calls"| GEM
+    SEARCH -->|"Search Results"| GEM
+    SEARCH -.->|"JSON API"| CSE
     FE --> SM
     SM <--> FIRESTORE
+
+    style CLIENT fill:#e8f0fe,stroke:#4285f4,stroke-width:2px
+    style GCP fill:#e6f4ea,stroke:#34a853,stroke-width:2px
+    style GEMINI fill:#fef7e0,stroke:#fbbc04,stroke-width:2px
+    style EXTERNAL fill:#fce8e6,stroke:#ea4335,stroke-width:2px
 ```
 
-The architecture consists of five layers that process frames from the client through to Gemini and back:
+### Architecture Overview
+
+The system follows a layered architecture designed for real-time voice interaction with sub-second latency:
+
+| Layer | Responsibility | Components |
+|-------|---------------|------------|
+| **Gateway** | WebSocket lifecycle, connection management | ConnectionGateway |
+| **Engine** | Frame routing via state machine (IDLE→CONNECTED→ACTIVE→DRAINING→CLOSED) | FrameEngine, ConnectionContext |
+| **Pipeline** | Data transformation between client and Gemini formats | Audio, Image, Action pipelines |
+| **Agent** | Gemini Live API integration, tool execution | LiveSession, Tool Router, Web Search |
+| **Persistence** | Session state with 10-minute resumption window | Session Manager, Firestore |
+
+### Available Tools
+
+| Tool | Type | Description |
+|------|------|-------------|
+| `SET_TIMER` | Client-bound | Triggers countdown timer on device |
+| `END_CALL` | Client-bound | Gracefully terminates the session |
+| `OPEN_URL` | Client-bound | Opens URLs/maps on device |
+| `FETCH_LOCATION` | Client-bound | Requests user's GPS location |
+| `SET_REMINDER` | Client-bound | Schedules push notification |
+| `REQUEST_BINARY_INPUT` | Client-bound | Yes/No via volume buttons |
+| `REQUEST_CAMERA_PREVIEW` | Client-bound | Captures photo from camera |
+| `COPY_TO_CLIPBOARD` | Client-bound | Copies text to device clipboard |
+| `WEB_SEARCH` | Model-bound | Grounding with real-time web data |
+
+---
+
+**Layer Details:**
 
 - **Gateway Layer** (`src/gateway/`): Handles WebSocket connection acceptance, lifecycle management, and guaranteed
   cleanup on disconnect. It initializes the connection context and spawns the frame engine.
@@ -101,8 +150,8 @@ The architecture consists of five layers that process frames from the client thr
   multimodal understanding. Actions are dispatched to handlers or routed to the client.
 
 - **Agent Layer** (`src/agent/`): Manages the bidirectional streaming connection to Gemini Live API through the Google
-  GenAI SDK. Action handlers execute server-side logic (e.g., SET_TIMER) or process client events (e.g.,
-  SHARE_LOCATION).
+  GenAI SDK. Tool handlers execute server-side logic (e.g., `WEB_SEARCH` for grounding) or send commands to the client
+  (e.g., `SET_TIMER`).
 
 - **Persistence Layer** (`src/session/`): Stores session state in Firestore with a device-as-identity pattern. Sessions
   can be resumed within a 10-minute window, allowing users to continue conversations across connection drops.
@@ -138,6 +187,7 @@ The architecture consists of five layers that process frames from the client thr
 - **Action System**: Server-to-client commands (SET_TIMER) and client-to-server events (SHARE_LOCATION) for interactive
   features
 - **Visual Context Support**: JPEG frame processing for multimodal understanding with the live session
+- **Web Search Grounding**: Real-time web data integration via Google Custom Search API for up-to-date information
 - **Graceful Degradation**: Connection state machine with frame queuing and guaranteed cleanup on disconnect
 
 ## Getting Started
@@ -183,10 +233,121 @@ The `.env.example` file contains the minimal configuration for local development
 | `GOOGLE_GENAI_USE_VERTEXAI` | Use Vertex AI instead of API key | `TRUE` (use `FALSE` for API key)         |
 | `GOOGLE_CLOUD_PROJECT`      | GCP project ID                   | Required for Vertex AI mode              |
 | `GOOGLE_CLOUD_LOCATION`     | GCP region                       | `us-central1` (required for Gemini Live) |
+| `GOOGLE_SEARCH_API_KEY`     | Custom Search API key            | Optional, for web grounding              |
+| `GOOGLE_SEARCH_ENGINE_ID`   | Programmable Search Engine ID    | Optional, for web grounding              |
 
 For local development with an API key, set `GOOGLE_GENAI_USE_VERTEXAI=FALSE` and provide your `GOOGLE_API_KEY`. For
 production deployment on Cloud Run, the service account credentials are used automatically with Vertex AI.
 
+## Testing Instructions for Judges
+
+This section provides a quick way to verify the backend functionality.
+
+### Quick Start (2 minutes)
+
+1. Clone and set up:
+   ```bash
+   git clone https://github.com/oadultradeepfield/jemmie-backend.git
+   cd jemmie-backend
+   cp .env.example .env
+   ```
+
+2. Add your Google API key to `.env`:
+   ```
+   GOOGLE_API_KEY=your-api-key-from-aistudio
+   GOOGLE_GENAI_USE_VERTEXAI=FALSE
+   ```
+
+3. Run the server:
+   ```bash
+   make dev
+   ```
+
+4. Verify the server is running:
+   ```bash
+   curl http://localhost:8080/health
+   # Expected: {"status":"healthy"}
+   ```
+
+### Verify WebSocket Endpoint
+
+The backend exposes a WebSocket endpoint at `ws://localhost:8080/ws/{device_id}`. You can test it with `wscat`:
+
+```bash
+# Install wscat if needed
+npm install -g wscat
+
+# Connect to the WebSocket
+wscat -c ws://localhost:8080/ws/test-device
+
+# Send a text message (triggers Gemini response)
+{"type":"TEXT","payload":{"text":"Hello, what can you do?"}}
+
+# Expected: Audio and text responses from the AI
+```
+
+### Test Web Search (Grounding)
+
+To test the web search feature, you need Google Custom Search API credentials:
+
+1. Enable **Custom Search API**
+   in [Google Cloud Console](https://console.cloud.google.com/apis/library/customsearch.googleapis.com)
+2. Create an **API Key** with Custom Search API access
+3. Create a **Programmable Search Engine**
+   at [programmablesearchengine.google.com](https://programmablesearchengine.google.com/)
+    - Set to "Search the entire web"
+    - Copy the Search Engine ID
+4. Add to `.env`:
+   ```
+   GOOGLE_SEARCH_API_KEY=your-api-key
+   GOOGLE_SEARCH_ENGINE_ID=your-engine-id
+   ```
+5. Restart the server and ask a time-sensitive question:
+   ```json
+   {"type":"TEXT","payload":{"text":"What's the latest news about AI today?"}}
+   ```
+
+### Run the Test Suite
+
+```bash
+make check    # Linting + type checking
+make test     # Run all tests (118 tests)
+```
+
+Expected output:
+
+```
+======================= 118 passed, 8 skipped in 17.25s =======================
+```
+
+The 8 skipped tests require a Firestore emulator for integration tests. To run them:
+
+```bash
+# Start Firestore emulator
+gcloud emulators firestore start --host-port=localhost:8081 &
+
+# Run tests
+FIRESTORE_EMULATOR_HOST=localhost:8081 make test
+```
+
+### Test with Frontend
+
+The backend is designed to work with the Jemmie mobile app. For full end-to-end testing:
+
+1. Deploy this backend or run locally
+2. Use the [Jemmie iOS app](https://github.com/Spchdt/Jemmie) pointing to your backend URL
+3. Test voice conversation, camera features, and location sharing
+
+### Deployed Instance
+
+The backend is deployed on Google Cloud Run:
+
+```
+https://jemmie-backend-XXXXX-uc.a.run.app
+```
+
+WebSocket endpoint: `wss://jemmie-backend-XXXXX-uc.a.run.app/ws/{device_id}`
+
 ## Deployment
 
 ### Infrastructure Setup
@@ -206,6 +367,8 @@ Add these secrets to your GitHub repository to enable automatic deployment on pu
 
 - `GCP_PROJECT_ID`: Your Google Cloud project ID
 - `GCP_SERVICE_ACCOUNT_KEY`: Service account JSON key (from the setup script output)
+- `GOOGLE_SEARCH_API_KEY`: Custom Search API key (optional, for web grounding)
+- `GOOGLE_SEARCH_ENGINE_ID`: Programmable Search Engine ID (optional, for web grounding)
 
 ### GCP Deployment Proof