Skip to content

Commit 6bb63c3

Browse files
ochafikclaude
andauthored
examples: add transcript-server example with live speech transcription (#240)
* feat: add transcript-server example with live speech transcription - New example: transcript-server with Web Speech API transcription - Start/stop recording with timer display - Live transcript with interim results - Send button to send transcript as ui/message to host - Copy button to copy full transcript - Sent messages marked with divider, greyed out - Experimental ui/update-model-context for live context updates - Transparent background, respects host theme - Enable microphone and clipboard in basic-host sandbox iframes - Add allow="microphone; clipboard-write" to both outer and inner iframes - Required for Web Speech API, audio capture, and clipboard access - Add alert popup in basic-host for ui/message from apps 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * style: format transcript-server files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * test: add e2e test and golden for transcript-server 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * refactor: use @types/dom-speech-recognition instead of hand-rolled types Removes ~50 lines of manually defined Web Speech API types in favor of the community-maintained type package. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * fix: add newlines between transcript chunks when copying to clipboard * refactor: unify transcript formatting with timestamps - Add formatEntry() helper that includes [timestamp] prefix - Add formatEntries() helper that joins entries with newlines - Use same format for clipboard copy, send message, and model context - DRY up duplicate code in send button handler * fix: log model context update failures instead of silently swallowing * fix: log model context updates for debugging * refactor: pass permissions through loadSandboxProxy like csp - Add permissions parameter to loadSandboxProxy in implementation.ts - Build iframe allow attribute from permissions instead of hardcoding - Update sandbox.ts to not hardcode allow attribute (set via notification) - Add microphone + clipboardWrite permissions to transcript-server resource - Update README to reflect permission configuration via resource _meta.ui * fix(transcript-server): auto-expand instead of scroll - Remove min-height: 100vh and overflow-y: auto - Let content grow naturally with autoResize (default) - Remove unnecessary scrollTop calls * test: remove transcript timer mask (no longer dynamic) * refactor: pass permissions through loadSandboxProxy like csp * docs: document updateModelContext for transitional speech * test: update e2e screenshots * refactor: use app.updateModelContext() instead of raw request * package-lock.json * src/generated/schema.json * test: update e2e screenshots from CI --------- Co-authored-by: Claude Opus 4.5 <[email protected]>
1 parent 53142ef commit 6bb63c3

22 files changed

+3117
-489
lines changed
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Transcript Server
2+
3+
An MCP App Server for live speech transcription using the Web Speech API.
4+
5+
## Features
6+
7+
- **Live Transcription**: Real-time speech-to-text using browser's Web Speech API
8+
- **Transitional Model Context**: Streams interim transcriptions to the model via `ui/update-model-context`, allowing the model to see what the user is saying as they speak
9+
- **Audio Level Indicator**: Visual feedback showing microphone input levels
10+
- **Send to Host**: Button to send completed transcriptions as a `ui/message` to the MCP host
11+
- **Start/Stop Control**: Toggle listening on and off
12+
- **Clear Transcript**: Reset the transcript area
13+
14+
## Setup
15+
16+
### Prerequisites
17+
18+
- Node.js 18+
19+
- Chrome, Edge, or Safari (Web Speech API support)
20+
21+
### Installation
22+
23+
```bash
24+
npm install
25+
```
26+
27+
### Running
28+
29+
```bash
30+
# Development mode (with hot reload)
31+
npm run dev
32+
33+
# Production build and serve
34+
npm run start
35+
```
36+
37+
## Usage
38+
39+
The server exposes a single tool:
40+
41+
### `transcribe`
42+
43+
Opens a live speech transcription interface.
44+
45+
**Parameters:** None
46+
47+
**Example:**
48+
49+
```json
50+
{
51+
"name": "transcribe",
52+
"arguments": {}
53+
}
54+
```
55+
56+
## How It Works
57+
58+
1. Click **Start** to begin listening
59+
2. Speak into your microphone
60+
3. Watch your speech appear as text in real-time (interim text is streamed to model context via `ui/update-model-context`)
61+
4. Click **Send** to send the transcript as a `ui/message` to the host (clears the model context)
62+
5. Click **Clear** to reset the transcript
63+
64+
## Architecture
65+
66+
```
67+
transcript-server/
68+
├── server.ts # MCP server with transcribe tool
69+
├── server-utils.ts # HTTP transport utilities
70+
├── mcp-app.html # Transcript UI entry point
71+
├── src/
72+
│ ├── mcp-app.ts # App logic, Web Speech API integration
73+
│ ├── mcp-app.css # Transcript UI styles
74+
│ └── global.css # Base styles
75+
└── dist/ # Built output (single HTML file)
76+
```
77+
78+
## Notes
79+
80+
- **Microphone Permission**: Requires `allow="microphone"` on the sandbox iframe (configured via `permissions: { microphone: {} }` in the resource `_meta.ui`)
81+
- **Browser Support**: Web Speech API is well-supported in Chrome/Edge, with Safari support. Firefox has limited support.
82+
- **Continuous Mode**: Recognition automatically restarts when it ends, for seamless transcription
83+
84+
## Future Enhancements
85+
86+
- Language selection dropdown
87+
- Whisper-based offline transcription (see TRANSCRIPTION.md)
88+
- Export transcript to file
89+
- Timestamps toggle
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
<!DOCTYPE html>
2+
<html lang="en">
3+
<head>
4+
<meta charset="UTF-8">
5+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
6+
<title>Live Transcript</title>
7+
</head>
8+
<body>
9+
<main class="transcript-app">
10+
<!-- Transcript Area -->
11+
<section class="transcript-section">
12+
<div class="transcript" id="transcript">
13+
<p class="transcript-placeholder">Your speech will appear here...</p>
14+
</div>
15+
</section>
16+
17+
<!-- Controls -->
18+
<section class="controls">
19+
<div class="controls-left">
20+
<button class="btn btn-primary" id="start-btn">
21+
<svg class="btn-icon" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
22+
<polygon points="5 3 19 12 5 21 5 3"/>
23+
</svg>
24+
Start
25+
</button>
26+
<div class="level-bar" id="level-bar">
27+
<div class="level-fill" id="mic-level"></div>
28+
</div>
29+
<span class="timer" id="timer">0:00</span>
30+
</div>
31+
<div class="controls-right">
32+
<button class="btn btn-secondary" id="copy-btn" title="Copy transcript">
33+
<svg class="btn-icon" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
34+
<rect x="9" y="9" width="13" height="13" rx="2" ry="2"/>
35+
<path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/>
36+
</svg>
37+
</button>
38+
<button class="btn btn-secondary" id="clear-btn">Clear</button>
39+
<button class="btn btn-accent" id="send-btn" disabled>Send</button>
40+
</div>
41+
</section>
42+
</main>
43+
<script type="module" src="/src/mcp-app.ts"></script>
44+
</body>
45+
</html>
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
{
2+
"name": "@modelcontextprotocol/server-transcript",
3+
"version": "0.1.0",
4+
"type": "module",
5+
"description": "MCP App Server for live speech transcription",
6+
"repository": {
7+
"type": "git",
8+
"url": "https://github.com/modelcontextprotocol/ext-apps",
9+
"directory": "examples/transcript-server"
10+
},
11+
"license": "MIT",
12+
"main": "server.ts",
13+
"files": [
14+
"server.ts",
15+
"server-utils.ts",
16+
"dist"
17+
],
18+
"scripts": {
19+
"build": "tsc --noEmit && cross-env INPUT=mcp-app.html vite build",
20+
"watch": "cross-env INPUT=mcp-app.html vite build --watch",
21+
"serve": "bun server.ts",
22+
"start": "cross-env NODE_ENV=development npm run build && npm run serve",
23+
"dev": "cross-env NODE_ENV=development concurrently 'npm run watch' 'npm run serve'",
24+
"prepublishOnly": "npm run build"
25+
},
26+
"dependencies": {
27+
"@modelcontextprotocol/ext-apps": "^0.3.1",
28+
"@modelcontextprotocol/sdk": "^1.24.0",
29+
"zod": "^3.23.0"
30+
},
31+
"devDependencies": {
32+
"@types/cors": "^2.8.19",
33+
"@types/dom-speech-recognition": "^0.0.7",
34+
"@types/express": "^5.0.0",
35+
"@types/node": "^22.0.0",
36+
"concurrently": "^9.2.1",
37+
"cors": "^2.8.5",
38+
"cross-env": "^10.1.0",
39+
"express": "^5.1.0",
40+
"typescript": "^5.9.3",
41+
"vite": "^6.0.0",
42+
"vite-plugin-singlefile": "^2.3.0"
43+
}
44+
}
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
/**
2+
* Shared utilities for running MCP servers with Streamable HTTP transport.
3+
*/
4+
5+
import { createMcpExpressApp } from "@modelcontextprotocol/sdk/server/express.js";
6+
import type { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
7+
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
8+
import cors from "cors";
9+
import type { Request, Response } from "express";
10+
11+
export interface ServerOptions {
12+
port: number;
13+
name?: string;
14+
}
15+
16+
/**
17+
* Starts an MCP server with Streamable HTTP transport in stateless mode.
18+
*/
19+
export async function startServer(
20+
createServer: () => McpServer,
21+
options: ServerOptions,
22+
): Promise<void> {
23+
const { port, name = "MCP Server" } = options;
24+
25+
const app = createMcpExpressApp({ host: "0.0.0.0" });
26+
app.use(cors());
27+
28+
app.all("/mcp", async (req: Request, res: Response) => {
29+
const server = createServer();
30+
const transport = new StreamableHTTPServerTransport({
31+
sessionIdGenerator: undefined,
32+
});
33+
34+
res.on("close", () => {
35+
transport.close().catch(() => {});
36+
server.close().catch(() => {});
37+
});
38+
39+
try {
40+
await server.connect(transport);
41+
await transport.handleRequest(req, res, req.body);
42+
} catch (error) {
43+
console.error("MCP error:", error);
44+
if (!res.headersSent) {
45+
res.status(500).json({
46+
jsonrpc: "2.0",
47+
error: { code: -32603, message: "Internal server error" },
48+
id: null,
49+
});
50+
}
51+
}
52+
});
53+
54+
const httpServer = app.listen(port, (err) => {
55+
if (err) {
56+
console.error("Failed to start server:", err);
57+
process.exit(1);
58+
}
59+
console.log(`${name} listening on http://localhost:${port}/mcp`);
60+
});
61+
62+
const shutdown = () => {
63+
console.log("\nShutting down...");
64+
httpServer.close(() => process.exit(0));
65+
};
66+
67+
process.on("SIGINT", shutdown);
68+
process.on("SIGTERM", shutdown);
69+
}
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
2+
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
3+
import type {
4+
CallToolResult,
5+
ReadResourceResult,
6+
} from "@modelcontextprotocol/sdk/types.js";
7+
import fs from "node:fs/promises";
8+
import path from "node:path";
9+
import {
10+
registerAppTool,
11+
registerAppResource,
12+
RESOURCE_MIME_TYPE,
13+
RESOURCE_URI_META_KEY,
14+
} from "@modelcontextprotocol/ext-apps/server";
15+
import { startServer } from "./server-utils.js";
16+
17+
const DIST_DIR = path.join(import.meta.dirname, "dist");
18+
const RESOURCE_URI = "ui://transcript/mcp-app.html";
19+
20+
/**
21+
* Creates a new MCP server instance with tools and resources registered.
22+
*/
23+
export function createServer(): McpServer {
24+
const server = new McpServer({
25+
name: "Transcript Server",
26+
version: "1.0.0",
27+
});
28+
29+
// Register the transcribe tool - opens a UI for live speech transcription
30+
registerAppTool(
31+
server,
32+
"transcribe",
33+
{
34+
title: "Transcribe Speech",
35+
description:
36+
"Opens a live speech transcription interface using the Web Speech API.",
37+
inputSchema: {},
38+
_meta: { [RESOURCE_URI_META_KEY]: RESOURCE_URI },
39+
},
40+
async (): Promise<CallToolResult> => {
41+
return {
42+
content: [
43+
{
44+
type: "text",
45+
text: JSON.stringify({
46+
status: "ready",
47+
message: "Transcription UI opened. Speak into your microphone.",
48+
}),
49+
},
50+
],
51+
};
52+
},
53+
);
54+
55+
// Register the UI resource
56+
registerAppResource(
57+
server,
58+
RESOURCE_URI,
59+
RESOURCE_URI,
60+
{ mimeType: RESOURCE_MIME_TYPE, description: "Transcript UI" },
61+
async (): Promise<ReadResourceResult> => {
62+
const html = await fs.readFile(
63+
path.join(DIST_DIR, "mcp-app.html"),
64+
"utf-8",
65+
);
66+
67+
return {
68+
contents: [
69+
{
70+
uri: RESOURCE_URI,
71+
mimeType: RESOURCE_MIME_TYPE,
72+
text: html,
73+
_meta: {
74+
ui: {
75+
// Request microphone for Web Speech API, clipboard for copy button
76+
permissions: { microphone: {}, clipboardWrite: {} },
77+
},
78+
},
79+
},
80+
],
81+
};
82+
},
83+
);
84+
85+
return server;
86+
}
87+
88+
async function main() {
89+
if (process.argv.includes("--stdio")) {
90+
await createServer().connect(new StdioServerTransport());
91+
} else {
92+
const port = parseInt(process.env.PORT ?? "3109", 10);
93+
await startServer(createServer, { port, name: "Transcript Server" });
94+
}
95+
}
96+
97+
main().catch((e) => {
98+
console.error(e);
99+
process.exit(1);
100+
});
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
* {
2+
box-sizing: border-box;
3+
}
4+
5+
html, body {
6+
font-family: system-ui, -apple-system, sans-serif;
7+
font-size: 1rem;
8+
margin: 0;
9+
padding: 0;
10+
/* No height: 100% - body must grow with content for ResizeObserver to detect changes */
11+
background: transparent;
12+
}

0 commit comments

Comments
 (0)