@@ -43,51 +43,100 @@ the agent layer interfaces with Gemini Live API, and the persistence layer maint
4343## Architecture
4444
4545``` mermaid
46+ %%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#4285f4', 'primaryTextColor': '#fff', 'primaryBorderColor': '#1a73e8', 'lineColor': '#5f6368', 'secondaryColor': '#34a853', 'tertiaryColor': '#fbbc04'}}}%%
4647flowchart TB
47- subgraph Client["Client"]
48- APP[Mobile/Web App]
48+ subgraph CLIENT["📱 Client Layer"]
49+ direction LR
50+ APP["Mobile App<br/><sub>iOS / Android</sub>"]
4951 end
5052
51- subgraph Gateway["Gateway Layer"]
52- WS[ConnectionGateway<br/>WebSocket Lifecycle]
53+ subgraph GCP["☁️ Google Cloud Platform"]
54+ direction TB
55+
56+ subgraph GATEWAY["🔗 Gateway Layer"]
57+ WS["ConnectionGateway<br/><sub>WebSocket Lifecycle</sub>"]
58+ end
59+
60+ subgraph ENGINE["⚙️ Engine Layer"]
61+ FE["FrameEngine<br/><sub>State Machine</sub>"]
62+ CC["ConnectionContext<br/><sub>Shared State</sub>"]
63+ end
64+
65+ subgraph PIPELINE["🔄 Pipeline Layer"]
66+ direction LR
67+ AUDIO["🔊 Audio<br/><sub>16kHz → 24kHz</sub>"]
68+ IMAGE["🖼️ Image<br/><sub>JPEG Processing</sub>"]
69+ ACTION["⚡ Action<br/><sub>Event Dispatch</sub>"]
70+ end
71+
72+ subgraph AGENT["🤖 Agent Layer"]
73+ LS["LiveSession<br/><sub>Streaming Manager</sub>"]
74+ ROUTER["Tool Router<br/><sub>9 Built-in Tools</sub>"]
75+ SEARCH["Web Search<br/><sub>Grounding</sub>"]
76+ end
77+
78+ subgraph PERSIST["💾 Persistence"]
79+ SM["Session Manager"]
80+ FIRESTORE[("Firestore<br/><sub>Session State</sub>")]
81+ end
5382 end
5483
55- subgraph Engine["Engine Layer"]
56- FE[FrameEngine<br/>State Machine]
57- CC[ConnectionContext<br/>Shared State]
84+ subgraph GEMINI["🌟 Google AI"]
85+ GEM["Gemini Live API<br/><sub>gemini-live-2.5-flash</sub>"]
5886 end
5987
60- subgraph Pipelines["Pipeline Layer"]
61- AUDIO[Audio Pipeline<br/>16kHz In / 24kHz Out]
62- IMAGE[Image Pipeline<br/>JPEG Processing]
63- ACTION[Action Pipeline<br/>Event Dispatch]
88+ subgraph EXTERNAL["🌐 External Services"]
89+ CSE["Custom Search API<br/><sub>Web Grounding</sub>"]
6490 end
6591
66- subgraph Agent["Agent Layer"]
67- LS[LiveSession<br/>Gemini Live API]
68- ROUTER[Action Router<br/>SET_TIMER / SHARE_LOCATION]
69- end
70-
71- subgraph Persistence["Persistence Layer"]
72- SM[Session Manager]
73- FIRESTORE[(Firestore)]
74- end
75-
76- APP <-->|"ws://host/ws/{device_id}"| WS
92+ APP <-->|"wss:// Secure WebSocket"| WS
7793 WS --> FE
7894 FE --> CC
79- CC --> AUDIO
80- CC --> IMAGE
81- CC --> ACTION
82- AUDIO --> LS
83- IMAGE --> LS
95+ CC --> AUDIO & IMAGE & ACTION
96+ AUDIO & IMAGE --> LS
8497 ACTION --> ROUTER
85- LS <-->|"Gemini Live API"| GEMINI[(Gemini)]
98+ ROUTER --> SEARCH
99+ LS <-->|"Real-time Audio<br/>Function Calls"| GEM
100+ SEARCH -->|"Search Results"| GEM
101+ SEARCH -.->|"JSON API"| CSE
86102 FE --> SM
87103 SM <--> FIRESTORE
104+
105+ style CLIENT fill:#e8f0fe,stroke:#4285f4,stroke-width:2px
106+ style GCP fill:#e6f4ea,stroke:#34a853,stroke-width:2px
107+ style GEMINI fill:#fef7e0,stroke:#fbbc04,stroke-width:2px
108+ style EXTERNAL fill:#fce8e6,stroke:#ea4335,stroke-width:2px
88109```
89110
90- The architecture consists of five layers that process frames from the client through to Gemini and back:
111+ ### Architecture Overview
112+
113+ The system follows a layered architecture designed for real-time voice interaction with sub-second latency:
114+
115+ | Layer | Responsibility | Components |
116+ | -------| ---------------| ------------|
117+ | ** Gateway** | WebSocket lifecycle, connection management | ConnectionGateway |
118+ | ** Engine** | Frame routing via state machine (IDLE→CONNECTED→ACTIVE→DRAINING→CLOSED) | FrameEngine, ConnectionContext |
119+ | ** Pipeline** | Data transformation between client and Gemini formats | Audio, Image, Action pipelines |
120+ | ** Agent** | Gemini Live API integration, tool execution | LiveSession, Tool Router, Web Search |
121+ | ** Persistence** | Session state with 10-minute resumption window | Session Manager, Firestore |
122+
123+ ### Available Tools
124+
125+ | Tool | Type | Description |
126+ | ------| ------| -------------|
127+ | ` SET_TIMER ` | Client-bound | Triggers countdown timer on device |
128+ | ` END_CALL ` | Client-bound | Gracefully terminates the session |
129+ | ` OPEN_URL ` | Client-bound | Opens URLs/maps on device |
130+ | ` FETCH_LOCATION ` | Client-bound | Requests user's GPS location |
131+ | ` SET_REMINDER ` | Client-bound | Schedules push notification |
132+ | ` REQUEST_BINARY_INPUT ` | Client-bound | Yes/No via volume buttons |
133+ | ` REQUEST_CAMERA_PREVIEW ` | Client-bound | Captures photo from camera |
134+ | ` COPY_TO_CLIPBOARD ` | Client-bound | Copies text to device clipboard |
135+ | ` WEB_SEARCH ` | Model-bound | Grounding with real-time web data |
136+
137+ ---
138+
139+ ** Layer Details:**
91140
92141- ** Gateway Layer** (` src/gateway/ ` ): Handles WebSocket connection acceptance, lifecycle management, and guaranteed
93142 cleanup on disconnect. It initializes the connection context and spawns the frame engine.
@@ -101,8 +150,8 @@ The architecture consists of five layers that process frames from the client thr
101150 multimodal understanding. Actions are dispatched to handlers or routed to the client.
102151
103152- ** Agent Layer** (` src/agent/ ` ): Manages the bidirectional streaming connection to Gemini Live API through the Google
104- GenAI SDK. Action handlers execute server-side logic (e.g., SET_TIMER ) or process client events (e.g.,
105- SHARE_LOCATION ).
153+ GenAI SDK. Tool handlers execute server-side logic (e.g., ` WEB_SEARCH ` for grounding ) or send commands to the client
154+ (e.g., ` SET_TIMER ` ).
106155
107156- ** Persistence Layer** (` src/session/ ` ): Stores session state in Firestore with a device-as-identity pattern. Sessions
108157 can be resumed within a 10-minute window, allowing users to continue conversations across connection drops.
@@ -138,6 +187,7 @@ The architecture consists of five layers that process frames from the client thr
138187- ** Action System** : Server-to-client commands (SET_TIMER) and client-to-server events (SHARE_LOCATION) for interactive
139188 features
140189- ** Visual Context Support** : JPEG frame processing for multimodal understanding with the live session
190+ - ** Web Search Grounding** : Real-time web data integration via Google Custom Search API for up-to-date information
141191- ** Graceful Degradation** : Connection state machine with frame queuing and guaranteed cleanup on disconnect
142192
143193## Getting Started
@@ -183,10 +233,121 @@ The `.env.example` file contains the minimal configuration for local development
183233| ` GOOGLE_GENAI_USE_VERTEXAI ` | Use Vertex AI instead of API key | ` TRUE ` (use ` FALSE ` for API key) |
184234| ` GOOGLE_CLOUD_PROJECT ` | GCP project ID | Required for Vertex AI mode |
185235| ` GOOGLE_CLOUD_LOCATION ` | GCP region | ` us-central1 ` (required for Gemini Live) |
236+ | ` GOOGLE_SEARCH_API_KEY ` | Custom Search API key | Optional, for web grounding |
237+ | ` GOOGLE_SEARCH_ENGINE_ID ` | Programmable Search Engine ID | Optional, for web grounding |
186238
187239For local development with an API key, set ` GOOGLE_GENAI_USE_VERTEXAI=FALSE ` and provide your ` GOOGLE_API_KEY ` . For
188240production deployment on Cloud Run, the service account credentials are used automatically with Vertex AI.
189241
242+ ## Testing Instructions for Judges
243+
244+ This section provides a quick way to verify the backend functionality.
245+
246+ ### Quick Start (2 minutes)
247+
248+ 1 . Clone and set up:
249+ ``` bash
250+ git clone https://github.com/oadultradeepfield/jemmie-backend.git
251+ cd jemmie-backend
252+ cp .env.example .env
253+ ```
254+
255+ 2 . Add your Google API key to ` .env ` :
256+ ```
257+ GOOGLE_API_KEY=your-api-key-from-aistudio
258+ GOOGLE_GENAI_USE_VERTEXAI=FALSE
259+ ```
260+
261+ 3 . Run the server:
262+ ``` bash
263+ make dev
264+ ```
265+
266+ 4 . Verify the server is running:
267+ ``` bash
268+ curl http://localhost:8080/health
269+ # Expected: {"status":"healthy"}
270+ ```
271+
272+ ### Verify WebSocket Endpoint
273+
274+ The backend exposes a WebSocket endpoint at ` ws://localhost:8080/ws/{device_id} ` . You can test it with ` wscat ` :
275+
276+ ``` bash
277+ # Install wscat if needed
278+ npm install -g wscat
279+
280+ # Connect to the WebSocket
281+ wscat -c ws://localhost:8080/ws/test-device
282+
283+ # Send a text message (triggers Gemini response)
284+ {" type" :" TEXT" ," payload" :{" text" :" Hello, what can you do?" }}
285+
286+ # Expected: Audio and text responses from the AI
287+ ```
288+
289+ ### Test Web Search (Grounding)
290+
291+ To test the web search feature, you need Google Custom Search API credentials:
292+
293+ 1 . Enable ** Custom Search API**
294+ in [ Google Cloud Console] ( https://console.cloud.google.com/apis/library/customsearch.googleapis.com )
295+ 2 . Create an ** API Key** with Custom Search API access
296+ 3 . Create a ** Programmable Search Engine**
297+ at [ programmablesearchengine.google.com] ( https://programmablesearchengine.google.com/ )
298+ - Set to "Search the entire web"
299+ - Copy the Search Engine ID
300+ 4 . Add to ` .env ` :
301+ ```
302+ GOOGLE_SEARCH_API_KEY=your-api-key
303+ GOOGLE_SEARCH_ENGINE_ID=your-engine-id
304+ ```
305+ 5 . Restart the server and ask a time-sensitive question:
306+ ``` json
307+ {"type" :" TEXT" ,"payload" :{"text" :" What's the latest news about AI today?" }}
308+ ```
309+
310+ ### Run the Test Suite
311+
312+ ``` bash
313+ make check # Linting + type checking
314+ make test # Run all tests (118 tests)
315+ ```
316+
317+ Expected output:
318+
319+ ```
320+ ======================= 118 passed, 8 skipped in 17.25s =======================
321+ ```
322+
323+ The 8 skipped tests require a Firestore emulator for integration tests. To run them:
324+
325+ ``` bash
326+ # Start Firestore emulator
327+ gcloud emulators firestore start --host-port=localhost:8081 &
328+
329+ # Run tests
330+ FIRESTORE_EMULATOR_HOST=localhost:8081 make test
331+ ```
332+
333+ ### Test with Frontend
334+
335+ The backend is designed to work with the Jemmie mobile app. For full end-to-end testing:
336+
337+ 1 . Deploy this backend or run locally
338+ 2 . Use the [ Jemmie iOS app] ( https://github.com/Spchdt/Jemmie ) pointing to your backend URL
339+ 3 . Test voice conversation, camera features, and location sharing
340+
341+ ### Deployed Instance
342+
343+ The backend is deployed on Google Cloud Run:
344+
345+ ```
346+ https://jemmie-backend-XXXXX-uc.a.run.app
347+ ```
348+
349+ WebSocket endpoint: ` wss://jemmie-backend-XXXXX-uc.a.run.app/ws/{device_id} `
350+
190351## Deployment
191352
192353### Infrastructure Setup
@@ -206,6 +367,8 @@ Add these secrets to your GitHub repository to enable automatic deployment on pu
206367
207368- ` GCP_PROJECT_ID ` : Your Google Cloud project ID
208369- ` GCP_SERVICE_ACCOUNT_KEY ` : Service account JSON key (from the setup script output)
370+ - ` GOOGLE_SEARCH_API_KEY ` : Custom Search API key (optional, for web grounding)
371+ - ` GOOGLE_SEARCH_ENGINE_ID ` : Programmable Search Engine ID (optional, for web grounding)
209372
210373### GCP Deployment Proof
211374
0 commit comments