Skip to content

Commit 6e398c2

Browse files
committed
docs: add AGENTS.md and CLAUDE.md for AI coding assistants
1 parent 3f3a5e1 commit 6e398c2

File tree

2 files changed

+203
-0
lines changed

2 files changed

+203
-0
lines changed

AGENTS.md

Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
# node-voice-agent
2+
3+
Node.js demo app for Deepgram Voice Agent.
4+
5+
## Architecture
6+
7+
- **Backend:** Node.js (JavaScript) on port 8081
8+
- **Frontend:** Vite + vanilla JS on port 8080 (git submodule: `voice-agent-html`)
9+
- **API type:** WebSocket — `WS /api/voice-agent`
10+
- **Deepgram API:** Agent API (`wss://agent.deepgram.com/v1/agent/converse`)
11+
- **Auth:** JWT session tokens via `/api/session` (WebSocket auth uses `access_token.<jwt>` subprotocol)
12+
13+
## Key Files
14+
15+
| File | Purpose |
16+
|------|---------|
17+
| `server.js` | Main backend — API endpoints and WebSocket proxy |
18+
| `deepgram.toml` | Metadata, lifecycle commands, tags |
19+
| `Makefile` | Standardized build/run targets |
20+
| `sample.env` | Environment variable template |
21+
| `frontend/main.js` | Frontend logic — UI controls, WebSocket connection, audio streaming |
22+
| `frontend/index.html` | HTML structure and UI layout |
23+
| `deploy/Dockerfile` | Production container (Caddy + backend) |
24+
| `deploy/Caddyfile` | Reverse proxy, rate limiting, static serving |
25+
26+
## Quick Start
27+
28+
```bash
29+
# Initialize (clone submodules + install deps)
30+
make init
31+
32+
# Set up environment
33+
test -f .env || cp sample.env .env # then set DEEPGRAM_API_KEY
34+
35+
# Start both servers
36+
make start
37+
# Backend: http://localhost:8081
38+
# Frontend: http://localhost:8080
39+
```
40+
41+
## Start / Stop
42+
43+
**Start (recommended):**
44+
```bash
45+
make start
46+
```
47+
48+
**Start separately:**
49+
```bash
50+
# Terminal 1 — Backend
51+
node server.js
52+
53+
# Terminal 2 — Frontend
54+
cd frontend && corepack pnpm run dev -- --port 8080 --no-open
55+
```
56+
57+
**Stop all:**
58+
```bash
59+
lsof -ti:8080,8081 | xargs kill -9 2>/dev/null
60+
```
61+
62+
**Clean rebuild:**
63+
```bash
64+
rm -rf node_modules frontend/node_modules frontend/.vite
65+
make init
66+
```
67+
68+
## Dependencies
69+
70+
- **Backend:** `package.json` — Uses `corepack pnpm` — Node's built-in package manager version pinning.
71+
- **Frontend:** `frontend/package.json` — Vite dev server
72+
- **Submodules:** `frontend/` (voice-agent-html), `contracts/` (starter-contracts)
73+
74+
Install: `corepack pnpm install`
75+
Frontend: `cd frontend && corepack pnpm install`
76+
77+
## API Endpoints
78+
79+
| Endpoint | Method | Auth | Purpose |
80+
|----------|--------|------|---------|
81+
| `/api/session` | GET | None | Issue JWT session token |
82+
| `/api/metadata` | GET | None | Return app metadata (useCase, framework, language) |
83+
| `/api/voice-agent` | WS | JWT | Full-duplex voice conversation with an AI agent. |
84+
85+
## Customization Guide
86+
87+
### How the Agent Works
88+
The backend is a **pure WebSocket proxy** — it forwards messages between the browser and Deepgram's Agent API. All agent configuration happens via JSON messages from the frontend.
89+
90+
### Agent Settings (sent from frontend)
91+
The frontend sends a `Settings` message after connecting:
92+
93+
```json
94+
{
95+
"type": "Settings",
96+
"audio": {
97+
"input": { "encoding": "linear16", "sample_rate": 16000 },
98+
"output": { "encoding": "linear16", "sample_rate": 16000 }
99+
},
100+
"agent": {
101+
"listen": { "provider": { "type": "deepgram", "model": "nova-3" } },
102+
"speak": { "provider": { "type": "deepgram", "model": "aura-2-thalia-en" } },
103+
"think": {
104+
"provider": { "type": "open_ai", "model": "gpt-4o-mini" },
105+
"prompt": "You are a helpful assistant."
106+
}
107+
}
108+
}
109+
```
110+
111+
### Customizable Components
112+
113+
| Component | Field | Options | Effect |
114+
|-----------|-------|---------|--------|
115+
| **Listen** (STT) | `agent.listen.provider.model` | `nova-3`, `nova-2` | Speech recognition model |
116+
| **Speak** (TTS) | `agent.speak.provider.model` | Any `aura-*` voice | Agent's voice |
117+
| **Think** (LLM) | `agent.think.provider.type` | `open_ai`, `anthropic` | LLM provider |
118+
| **Think** (LLM) | `agent.think.provider.model` | `gpt-4o-mini`, `gpt-4o`, etc. | LLM model |
119+
| **Prompt** | `agent.think.prompt` | Any system prompt | Agent personality/behavior |
120+
121+
### Live Updates (no reconnect needed)
122+
The frontend can update these settings mid-conversation:
123+
- `{ "type": "UpdateSpeak", "model": "aura-2-luna-en" }` — Change voice
124+
- `{ "type": "UpdatePrompt", "prompt": "New instructions..." }` — Change prompt
125+
- `{ "type": "InjectUserMessage", "content": "text" }` — Send text as user
126+
127+
### Adding Function Calling
128+
The Agent API supports function calling. Add a `functions` array to the Settings message:
129+
```json
130+
{
131+
"agent": {
132+
"think": {
133+
"functions": [
134+
{
135+
"name": "get_weather",
136+
"description": "Get current weather",
137+
"parameters": { "type": "object", "properties": { "city": { "type": "string" } } }
138+
}
139+
]
140+
}
141+
}
142+
}
143+
```
144+
Then handle `FunctionCallRequest` messages in the frontend and respond with `FunctionCallResponse`.
145+
146+
### Frontend UI Controls
147+
The frontend provides:
148+
- Model dropdowns for listen/speak/think (pre-connection)
149+
- System prompt textarea (editable pre and post connection)
150+
- Chat input for text messages
151+
- "Update Settings" button for live changes
152+
153+
To add new controls, edit `frontend/main.js` and include the values in the Settings/Update messages.
154+
155+
## Frontend Changes
156+
157+
The frontend is a git submodule from `deepgram-starters/voice-agent-html`. To modify:
158+
159+
1. **Edit files in `frontend/`** — this is the working copy
160+
2. **Test locally** — changes reflect immediately via Vite HMR
161+
3. **Commit in the submodule:** `cd frontend && git add . && git commit -m "feat: description"`
162+
4. **Push the frontend repo:** `cd frontend && git push origin main`
163+
5. **Update the submodule ref:** `cd .. && git add frontend && git commit -m "chore(deps): update frontend submodule"`
164+
165+
**IMPORTANT:** Always edit `frontend/` inside THIS starter directory. The standalone `voice-agent-html/` directory at the monorepo root is a separate checkout.
166+
167+
### Adding a UI Control for a New Feature
168+
1. Add the HTML element in `frontend/index.html` (input, checkbox, dropdown, etc.)
169+
2. Read the value in `frontend/main.js` when making the API call or opening the WebSocket
170+
3. Pass it as a query parameter in the WebSocket URL
171+
4. Handle it in the backend `server.js` — read the param and pass it to the Deepgram API
172+
173+
## Environment Variables
174+
175+
| Variable | Required | Default | Purpose |
176+
|----------|----------|---------|---------|
177+
| `DEEPGRAM_API_KEY` | Yes || Deepgram API key |
178+
| `PORT` | No | `8081` | Backend server port |
179+
| `HOST` | No | `0.0.0.0` | Backend bind address |
180+
| `SESSION_SECRET` | No || JWT signing secret (production) |
181+
182+
## Conventional Commits
183+
184+
All commits must follow conventional commits format. Never include `Co-Authored-By` lines for Claude.
185+
186+
```
187+
feat(node-voice-agent): add diarization support
188+
fix(node-voice-agent): resolve WebSocket close handling
189+
refactor(node-voice-agent): simplify session endpoint
190+
chore(deps): update frontend submodule
191+
```
192+
193+
## Testing
194+
195+
```bash
196+
# Run conformance tests (requires app to be running)
197+
make test
198+
199+
# Manual endpoint check
200+
curl -sf http://localhost:8081/api/metadata | python3 -m json.tool
201+
curl -sf http://localhost:8081/api/session | python3 -m json.tool
202+
```

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
@AGENTS.md

0 commit comments

Comments
 (0)