Skip to content

Commit 823b21b

Browse files
Update documentation with architecture diagram and testing instructions
Add presentation-ready Mermaid diagram with Google-themed styling. Include testing instructions for judges and web search setup guide. Update CI workflow to pass search API secrets to Cloud Run.
1 parent 4f289d4 commit 823b21b

File tree

3 files changed

+201
-32
lines changed

3 files changed

+201
-32
lines changed

.env.example

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,11 @@ GOOGLE_GENAI_USE_VERTEXAI=FALSE
88
# GOOGLE_CLOUD_PROJECT=your-project-id
99
# GOOGLE_CLOUD_LOCATION=us-central1
1010

11+
# Web Search (Google Custom Search API) - Optional
12+
# Enable at: https://console.cloud.google.com/apis/library/customsearch.googleapis.com
13+
# Create engine at: https://programmablesearchengine.google.com/
14+
# GOOGLE_SEARCH_API_KEY=your-search-api-key
15+
# GOOGLE_SEARCH_ENGINE_ID=your-search-engine-id
16+
1117
# Optional
1218
# LOG_LEVEL=DEBUG

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ jobs:
8181
--min-instances=1 \
8282
--port=8080 \
8383
--allow-unauthenticated \
84-
--set-env-vars="GOOGLE_GENAI_USE_VERTEXAI=TRUE,GOOGLE_CLOUD_PROJECT=${{ secrets.GCP_PROJECT_ID }},GOOGLE_CLOUD_LOCATION=${{ env.REGION }}"
84+
--set-env-vars="GOOGLE_GENAI_USE_VERTEXAI=TRUE,GOOGLE_CLOUD_PROJECT=${{ secrets.GCP_PROJECT_ID }},GOOGLE_CLOUD_LOCATION=${{ env.REGION }},GOOGLE_SEARCH_API_KEY=${{ secrets.GOOGLE_SEARCH_API_KEY }},GOOGLE_SEARCH_ENGINE_ID=${{ secrets.GOOGLE_SEARCH_ENGINE_ID }}"
8585
8686
- name: Get Service URL
8787
run: |

README.md

Lines changed: 194 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -43,51 +43,100 @@ the agent layer interfaces with Gemini Live API, and the persistence layer maint
4343
## Architecture
4444

4545
```mermaid
46+
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#4285f4', 'primaryTextColor': '#fff', 'primaryBorderColor': '#1a73e8', 'lineColor': '#5f6368', 'secondaryColor': '#34a853', 'tertiaryColor': '#fbbc04'}}}%%
4647
flowchart TB
47-
subgraph Client["Client"]
48-
APP[Mobile/Web App]
48+
subgraph CLIENT["📱 Client Layer"]
49+
direction LR
50+
APP["Mobile App<br/><sub>iOS / Android</sub>"]
4951
end
5052
51-
subgraph Gateway["Gateway Layer"]
52-
WS[ConnectionGateway<br/>WebSocket Lifecycle]
53+
subgraph GCP["☁️ Google Cloud Platform"]
54+
direction TB
55+
56+
subgraph GATEWAY["🔗 Gateway Layer"]
57+
WS["ConnectionGateway<br/><sub>WebSocket Lifecycle</sub>"]
58+
end
59+
60+
subgraph ENGINE["⚙️ Engine Layer"]
61+
FE["FrameEngine<br/><sub>State Machine</sub>"]
62+
CC["ConnectionContext<br/><sub>Shared State</sub>"]
63+
end
64+
65+
subgraph PIPELINE["🔄 Pipeline Layer"]
66+
direction LR
67+
AUDIO["🔊 Audio<br/><sub>16kHz → 24kHz</sub>"]
68+
IMAGE["🖼️ Image<br/><sub>JPEG Processing</sub>"]
69+
ACTION["⚡ Action<br/><sub>Event Dispatch</sub>"]
70+
end
71+
72+
subgraph AGENT["🤖 Agent Layer"]
73+
LS["LiveSession<br/><sub>Streaming Manager</sub>"]
74+
ROUTER["Tool Router<br/><sub>9 Built-in Tools</sub>"]
75+
SEARCH["Web Search<br/><sub>Grounding</sub>"]
76+
end
77+
78+
subgraph PERSIST["💾 Persistence"]
79+
SM["Session Manager"]
80+
FIRESTORE[("Firestore<br/><sub>Session State</sub>")]
81+
end
5382
end
5483
55-
subgraph Engine["Engine Layer"]
56-
FE[FrameEngine<br/>State Machine]
57-
CC[ConnectionContext<br/>Shared State]
84+
subgraph GEMINI["🌟 Google AI"]
85+
GEM["Gemini Live API<br/><sub>gemini-live-2.5-flash</sub>"]
5886
end
5987
60-
subgraph Pipelines["Pipeline Layer"]
61-
AUDIO[Audio Pipeline<br/>16kHz In / 24kHz Out]
62-
IMAGE[Image Pipeline<br/>JPEG Processing]
63-
ACTION[Action Pipeline<br/>Event Dispatch]
88+
subgraph EXTERNAL["🌐 External Services"]
89+
CSE["Custom Search API<br/><sub>Web Grounding</sub>"]
6490
end
6591
66-
subgraph Agent["Agent Layer"]
67-
LS[LiveSession<br/>Gemini Live API]
68-
ROUTER[Action Router<br/>SET_TIMER / SHARE_LOCATION]
69-
end
70-
71-
subgraph Persistence["Persistence Layer"]
72-
SM[Session Manager]
73-
FIRESTORE[(Firestore)]
74-
end
75-
76-
APP <-->|"ws://host/ws/{device_id}"| WS
92+
APP <-->|"wss:// Secure WebSocket"| WS
7793
WS --> FE
7894
FE --> CC
79-
CC --> AUDIO
80-
CC --> IMAGE
81-
CC --> ACTION
82-
AUDIO --> LS
83-
IMAGE --> LS
95+
CC --> AUDIO & IMAGE & ACTION
96+
AUDIO & IMAGE --> LS
8497
ACTION --> ROUTER
85-
LS <-->|"Gemini Live API"| GEMINI[(Gemini)]
98+
ROUTER --> SEARCH
99+
LS <-->|"Real-time Audio<br/>Function Calls"| GEM
100+
SEARCH -->|"Search Results"| GEM
101+
SEARCH -.->|"JSON API"| CSE
86102
FE --> SM
87103
SM <--> FIRESTORE
104+
105+
style CLIENT fill:#e8f0fe,stroke:#4285f4,stroke-width:2px
106+
style GCP fill:#e6f4ea,stroke:#34a853,stroke-width:2px
107+
style GEMINI fill:#fef7e0,stroke:#fbbc04,stroke-width:2px
108+
style EXTERNAL fill:#fce8e6,stroke:#ea4335,stroke-width:2px
88109
```
89110

90-
The architecture consists of five layers that process frames from the client through to Gemini and back:
111+
### Architecture Overview
112+
113+
The system follows a layered architecture designed for real-time voice interaction with sub-second latency:
114+
115+
| Layer | Responsibility | Components |
116+
|-------|---------------|------------|
117+
| **Gateway** | WebSocket lifecycle, connection management | ConnectionGateway |
118+
| **Engine** | Frame routing via state machine (IDLE→CONNECTED→ACTIVE→DRAINING→CLOSED) | FrameEngine, ConnectionContext |
119+
| **Pipeline** | Data transformation between client and Gemini formats | Audio, Image, Action pipelines |
120+
| **Agent** | Gemini Live API integration, tool execution | LiveSession, Tool Router, Web Search |
121+
| **Persistence** | Session state with 10-minute resumption window | Session Manager, Firestore |
122+
123+
### Available Tools
124+
125+
| Tool | Type | Description |
126+
|------|------|-------------|
127+
| `SET_TIMER` | Client-bound | Triggers countdown timer on device |
128+
| `END_CALL` | Client-bound | Gracefully terminates the session |
129+
| `OPEN_URL` | Client-bound | Opens URLs/maps on device |
130+
| `FETCH_LOCATION` | Client-bound | Requests user's GPS location |
131+
| `SET_REMINDER` | Client-bound | Schedules push notification |
132+
| `REQUEST_BINARY_INPUT` | Client-bound | Yes/No via volume buttons |
133+
| `REQUEST_CAMERA_PREVIEW` | Client-bound | Captures photo from camera |
134+
| `COPY_TO_CLIPBOARD` | Client-bound | Copies text to device clipboard |
135+
| `WEB_SEARCH` | Model-bound | Grounding with real-time web data |
136+
137+
---
138+
139+
**Layer Details:**
91140

92141
- **Gateway Layer** (`src/gateway/`): Handles WebSocket connection acceptance, lifecycle management, and guaranteed
93142
cleanup on disconnect. It initializes the connection context and spawns the frame engine.
@@ -101,8 +150,8 @@ The architecture consists of five layers that process frames from the client thr
101150
multimodal understanding. Actions are dispatched to handlers or routed to the client.
102151

103152
- **Agent Layer** (`src/agent/`): Manages the bidirectional streaming connection to Gemini Live API through the Google
104-
GenAI SDK. Action handlers execute server-side logic (e.g., SET_TIMER) or process client events (e.g.,
105-
SHARE_LOCATION).
153+
GenAI SDK. Tool handlers execute server-side logic (e.g., `WEB_SEARCH` for grounding) or send commands to the client
154+
(e.g., `SET_TIMER`).
106155

107156
- **Persistence Layer** (`src/session/`): Stores session state in Firestore with a device-as-identity pattern. Sessions
108157
can be resumed within a 10-minute window, allowing users to continue conversations across connection drops.
@@ -138,6 +187,7 @@ The architecture consists of five layers that process frames from the client thr
138187
- **Action System**: Server-to-client commands (SET_TIMER) and client-to-server events (SHARE_LOCATION) for interactive
139188
features
140189
- **Visual Context Support**: JPEG frame processing for multimodal understanding with the live session
190+
- **Web Search Grounding**: Real-time web data integration via Google Custom Search API for up-to-date information
141191
- **Graceful Degradation**: Connection state machine with frame queuing and guaranteed cleanup on disconnect
142192

143193
## Getting Started
@@ -183,10 +233,121 @@ The `.env.example` file contains the minimal configuration for local development
183233
| `GOOGLE_GENAI_USE_VERTEXAI` | Use Vertex AI instead of API key | `TRUE` (use `FALSE` for API key) |
184234
| `GOOGLE_CLOUD_PROJECT` | GCP project ID | Required for Vertex AI mode |
185235
| `GOOGLE_CLOUD_LOCATION` | GCP region | `us-central1` (required for Gemini Live) |
236+
| `GOOGLE_SEARCH_API_KEY` | Custom Search API key | Optional, for web grounding |
237+
| `GOOGLE_SEARCH_ENGINE_ID` | Programmable Search Engine ID | Optional, for web grounding |
186238

187239
For local development with an API key, set `GOOGLE_GENAI_USE_VERTEXAI=FALSE` and provide your `GOOGLE_API_KEY`. For
188240
production deployment on Cloud Run, the service account credentials are used automatically with Vertex AI.
189241

242+
## Testing Instructions for Judges
243+
244+
This section provides a quick way to verify the backend functionality.
245+
246+
### Quick Start (2 minutes)
247+
248+
1. Clone and set up:
249+
```bash
250+
git clone https://github.com/oadultradeepfield/jemmie-backend.git
251+
cd jemmie-backend
252+
cp .env.example .env
253+
```
254+
255+
2. Add your Google API key to `.env`:
256+
```
257+
GOOGLE_API_KEY=your-api-key-from-aistudio
258+
GOOGLE_GENAI_USE_VERTEXAI=FALSE
259+
```
260+
261+
3. Run the server:
262+
```bash
263+
make dev
264+
```
265+
266+
4. Verify the server is running:
267+
```bash
268+
curl http://localhost:8080/health
269+
# Expected: {"status":"healthy"}
270+
```
271+
272+
### Verify WebSocket Endpoint
273+
274+
The backend exposes a WebSocket endpoint at `ws://localhost:8080/ws/{device_id}`. You can test it with `wscat`:
275+
276+
```bash
277+
# Install wscat if needed
278+
npm install -g wscat
279+
280+
# Connect to the WebSocket
281+
wscat -c ws://localhost:8080/ws/test-device
282+
283+
# Send a text message (triggers Gemini response)
284+
{"type":"TEXT","payload":{"text":"Hello, what can you do?"}}
285+
286+
# Expected: Audio and text responses from the AI
287+
```
288+
289+
### Test Web Search (Grounding)
290+
291+
To test the web search feature, you need Google Custom Search API credentials:
292+
293+
1. Enable **Custom Search API**
294+
in [Google Cloud Console](https://console.cloud.google.com/apis/library/customsearch.googleapis.com)
295+
2. Create an **API Key** with Custom Search API access
296+
3. Create a **Programmable Search Engine**
297+
at [programmablesearchengine.google.com](https://programmablesearchengine.google.com/)
298+
- Set to "Search the entire web"
299+
- Copy the Search Engine ID
300+
4. Add to `.env`:
301+
```
302+
GOOGLE_SEARCH_API_KEY=your-api-key
303+
GOOGLE_SEARCH_ENGINE_ID=your-engine-id
304+
```
305+
5. Restart the server and ask a time-sensitive question:
306+
```json
307+
{"type":"TEXT","payload":{"text":"What's the latest news about AI today?"}}
308+
```
309+
310+
### Run the Test Suite
311+
312+
```bash
313+
make check # Linting + type checking
314+
make test # Run all tests (118 tests)
315+
```
316+
317+
Expected output:
318+
319+
```
320+
======================= 118 passed, 8 skipped in 17.25s =======================
321+
```
322+
323+
The 8 skipped tests require a Firestore emulator for integration tests. To run them:
324+
325+
```bash
326+
# Start Firestore emulator
327+
gcloud emulators firestore start --host-port=localhost:8081 &
328+
329+
# Run tests
330+
FIRESTORE_EMULATOR_HOST=localhost:8081 make test
331+
```
332+
333+
### Test with Frontend
334+
335+
The backend is designed to work with the Jemmie mobile app. For full end-to-end testing:
336+
337+
1. Deploy this backend or run locally
338+
2. Use the [Jemmie iOS app](https://github.com/Spchdt/Jemmie) pointing to your backend URL
339+
3. Test voice conversation, camera features, and location sharing
340+
341+
### Deployed Instance
342+
343+
The backend is deployed on Google Cloud Run:
344+
345+
```
346+
https://jemmie-backend-XXXXX-uc.a.run.app
347+
```
348+
349+
WebSocket endpoint: `wss://jemmie-backend-XXXXX-uc.a.run.app/ws/{device_id}`
350+
190351
## Deployment
191352

192353
### Infrastructure Setup
@@ -206,6 +367,8 @@ Add these secrets to your GitHub repository to enable automatic deployment on pu
206367

207368
- `GCP_PROJECT_ID`: Your Google Cloud project ID
208369
- `GCP_SERVICE_ACCOUNT_KEY`: Service account JSON key (from the setup script output)
370+
- `GOOGLE_SEARCH_API_KEY`: Custom Search API key (optional, for web grounding)
371+
- `GOOGLE_SEARCH_ENGINE_ID`: Programmable Search Engine ID (optional, for web grounding)
209372

210373
### GCP Deployment Proof
211374

0 commit comments

Comments
 (0)