Skip to content

Commit 0c2c2da

Browse files
committed
docs: add AGENTS.md and CLAUDE.md for AI coding assistants
1 parent a2fee3c commit 0c2c2da

File tree

2 files changed

+169
-0
lines changed

2 files changed

+169
-0
lines changed

AGENTS.md

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
# node-text-to-speech
2+
3+
Node.js demo app for Deepgram Text-to-Speech.
4+
5+
## Architecture
6+
7+
- **Backend:** Node.js (JavaScript) on port 8081
8+
- **Frontend:** Vite + vanilla JS on port 8080 (git submodule: `text-to-speech-html`)
9+
- **API type:** REST — `POST /api/text-to-speech`
10+
- **Deepgram API:** Text-to-Speech (`/v1/speak`)
11+
- **Auth:** JWT session tokens via `/api/session` (WebSocket auth uses `access_token.<jwt>` subprotocol)
12+
13+
## Key Files
14+
15+
| File | Purpose |
16+
|------|---------|
17+
| `server.js` | Main backend — API endpoints and request handlers |
18+
| `deepgram.toml` | Metadata, lifecycle commands, tags |
19+
| `Makefile` | Standardized build/run targets |
20+
| `sample.env` | Environment variable template |
21+
| `frontend/main.js` | Frontend logic — UI controls, API calls, result rendering |
22+
| `frontend/index.html` | HTML structure and UI layout |
23+
| `deploy/Dockerfile` | Production container (Caddy + backend) |
24+
| `deploy/Caddyfile` | Reverse proxy, rate limiting, static serving |
25+
26+
## Quick Start
27+
28+
```bash
29+
# Initialize (clone submodules + install deps)
30+
make init
31+
32+
# Set up environment
33+
test -f .env || cp sample.env .env # then set DEEPGRAM_API_KEY
34+
35+
# Start both servers
36+
make start
37+
# Backend: http://localhost:8081
38+
# Frontend: http://localhost:8080
39+
```
40+
41+
## Start / Stop
42+
43+
**Start (recommended):**
44+
```bash
45+
make start
46+
```
47+
48+
**Start separately:**
49+
```bash
50+
# Terminal 1 — Backend
51+
node server.js
52+
53+
# Terminal 2 — Frontend
54+
cd frontend && corepack pnpm run dev -- --port 8080 --no-open
55+
```
56+
57+
**Stop all:**
58+
```bash
59+
lsof -ti:8080,8081 | xargs kill -9 2>/dev/null
60+
```
61+
62+
**Clean rebuild:**
63+
```bash
64+
rm -rf node_modules frontend/node_modules frontend/.vite
65+
make init
66+
```
67+
68+
## Dependencies
69+
70+
- **Backend:** `package.json` — Uses `corepack pnpm` — Node's built-in package manager version pinning.
71+
- **Frontend:** `frontend/package.json` — Vite dev server
72+
- **Submodules:** `frontend/` (text-to-speech-html), `contracts/` (starter-contracts)
73+
74+
Install: `corepack pnpm install`
75+
Frontend: `cd frontend && corepack pnpm install`
76+
77+
## API Endpoints
78+
79+
| Endpoint | Method | Auth | Purpose |
80+
|----------|--------|------|---------|
81+
| `/api/session` | GET | None | Issue JWT session token |
82+
| `/api/metadata` | GET | None | Return app metadata (useCase, framework, language) |
83+
| `/api/text-to-speech` | POST | JWT | Converts text to speech audio using Deepgram's TTS API. |
84+
85+
## Customization Guide
86+
87+
### Changing the Default Voice
88+
Find the `DEFAULT_MODEL` or `model` variable in the backend. Deepgram offers many voice options:
89+
90+
**Aura 2 voices** (latest, highest quality):
91+
- `aura-2-thalia-en` (default)
92+
- `aura-2-andromeda-en`
93+
- `aura-2-arcas-en`
94+
- `aura-2-atlas-en`
95+
- `aura-2-luna-en`
96+
- `aura-2-orion-en`
97+
- `aura-2-stella-en`
98+
- `aura-2-zeus-en`
99+
100+
**Legacy Aura voices:** `aura-asteria-en`, `aura-luna-en`, `aura-stella-en`, etc.
101+
102+
### Adding Audio Format Options
103+
The TTS API supports different output formats via query parameters:
104+
105+
| Parameter | Default | Options | Effect |
106+
|-----------|---------|---------|--------|
107+
| `model` | `aura-2-thalia-en` | See voice list | Voice selection |
108+
| `encoding` | (varies) | `linear16`, `mp3`, `opus`, `flac`, `alaw`, `mulaw` | Audio encoding |
109+
| `container` | (varies) | `wav`, `mp3`, `ogg`, `none` | Container format |
110+
| `sample_rate` | `24000` | `8000`-`48000` | Output sample rate |
111+
| `bit_rate` | (varies) | `32000`-`320000` | For lossy codecs |
112+
113+
**Backend:** Add these as query params to the Deepgram API call or SDK options.
114+
**Frontend:** Add dropdowns for encoding/format in `frontend/main.js`.
115+
116+
### Customizing the Input
117+
- The frontend sends `{ text }` in the request body
118+
- You could add SSML support by passing SSML-formatted text
119+
- Add a character/word limit by validating in the backend
120+
121+
## Frontend Changes
122+
123+
The frontend is a git submodule from `deepgram-starters/text-to-speech-html`. To modify:
124+
125+
1. **Edit files in `frontend/`** — this is the working copy
126+
2. **Test locally** — changes reflect immediately via Vite HMR
127+
3. **Commit in the submodule:** `cd frontend && git add . && git commit -m "feat: description"`
128+
4. **Push the frontend repo:** `cd frontend && git push origin main`
129+
5. **Update the submodule ref:** `cd .. && git add frontend && git commit -m "chore(deps): update frontend submodule"`
130+
131+
**IMPORTANT:** Always edit `frontend/` inside THIS starter directory. The standalone `text-to-speech-html/` directory at the monorepo root is a separate checkout.
132+
133+
### Adding a UI Control for a New Feature
134+
1. Add the HTML element in `frontend/index.html` (input, checkbox, dropdown, etc.)
135+
2. Read the value in `frontend/main.js` when making the API call or opening the WebSocket
136+
3. Pass it as a query parameter or request body field
137+
4. Handle it in the backend `server.js` — read the param and pass it to the Deepgram API
138+
139+
## Environment Variables
140+
141+
| Variable | Required | Default | Purpose |
142+
|----------|----------|---------|---------|
143+
| `DEEPGRAM_API_KEY` | Yes || Deepgram API key |
144+
| `PORT` | No | `8081` | Backend server port |
145+
| `HOST` | No | `0.0.0.0` | Backend bind address |
146+
| `SESSION_SECRET` | No || JWT signing secret (production) |
147+
148+
## Conventional Commits
149+
150+
All commits must follow conventional commits format. Never include `Co-Authored-By` lines for Claude.
151+
152+
```
153+
feat(node-text-to-speech): add diarization support
154+
fix(node-text-to-speech): resolve WebSocket close handling
155+
refactor(node-text-to-speech): simplify session endpoint
156+
chore(deps): update frontend submodule
157+
```
158+
159+
## Testing
160+
161+
```bash
162+
# Run conformance tests (requires app to be running)
163+
make test
164+
165+
# Manual endpoint check
166+
curl -sf http://localhost:8081/api/metadata | python3 -m json.tool
167+
curl -sf http://localhost:8081/api/session | python3 -m json.tool
168+
```

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
@AGENTS.md

0 commit comments

Comments
 (0)