Skip to content

Commit c9e3720

Browse files
ochafikclaude
andauthored
tests: add Playwright E2E tests with screenshot golden testing (+ fix examples session handling) (#115)
* Add Playwright E2E tests with screenshot golden testing - Add E2E tests for all 8 MCP server examples - Screenshot golden images for visual regression testing - CI workflow for running E2E tests - npm scripts: test:e2e, test:e2e:update, test:e2e:ui 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Add default demo code for Three.js example When no code is provided, show a rotating green cube demo. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Add E2E testing documentation to CONTRIBUTING.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Update Three.js golden screenshot with 3D cube 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Add explicit permissions to CI workflow Set minimal read-only permissions for security best practices. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Fix test.setTimeout() to be inside describe blocks Playwright requires setTimeout to be called within a test or describe block. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Simplify E2E tests - remove excessive timeouts - Remove explicit setTimeout calls (default 30s is sufficient) - Replace waitForTimeout(5000/6000) with proper waitForAppLoad() - Wait for inner iframe visibility instead of fixed delays - Keep only 500ms stabilization for screenshot animations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Auto-generate missing snapshots in CI Use updateSnapshots: 'missing' in CI to handle cross-platform screenshot differences (macOS vs Linux). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Use platform-agnostic golden screenshots - Remove -chromium-darwin suffix from snapshot filenames - Configure snapshotPathTemplate for cross-platform compatibility - Increase tolerance to 5% for rendering differences between platforms 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Refactor tests to use forEach instead of for-of loop May help with Playwright test discovery issues in CI. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Use npm ci and list reporter in CI - Use npm ci for exact package versions - Use --reporter=list instead of html to avoid potential re-evaluation issues - Only upload test-results on failure 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Exclude e2e tests from bun test Run bun test only on src/ directory to avoid running Playwright tests with Bun's test runner. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Fix prettier formatting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Add interaction tests for basic server apps Test that clicking buttons in the MCP App triggers the corresponding host callbacks: - Send Message → host logs "Message from MCP App" - Send Log → host logs "Log message from MCP App" - Open Link → host logs "Open link request" Tests both React and Vanilla JS implementations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Mask dynamic content in E2E screenshot tests Add Playwright mask option to handle servers with dynamic/random content: - basic-react/vanillajs: mask server time display - system-monitor: mask CPU chart, memory stats, uptime - cohort-heatmap: mask heatmap grid (random data) - customer-segmentation: mask scatter chart (random data) This addresses PR feedback about handling examples with non-deterministic output. Masking replaces the previous 10% tolerance with proper exclusion of dynamic elements, allowing tighter 1% tolerance for the rest of the UI. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Add wiki-explorer to E2E tests with default URL param - Add default URL "https://en.wikipedia.org/wiki/Model_Context_Protocol" to wiki-explorer-server's get-first-degree-links tool inputSchema - Update basic-host to automatically populate input field with tool defaults extracted from inputSchema.properties - Add wiki-explorer-server to E2E test suite with dynamic masking for the force-directed graph 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Use .default() for threejs tool schema instead of .optional() This exposes the default code and height values in the JSON Schema, allowing basic-host to auto-populate the input field with defaults. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Enable parallel E2E tests with timeouts and canvas masking - Enable fullyParallel with 4 workers locally (2 in CI) - Add 30s timeout per test - Mask threejs canvas for stable screenshots - Update snapshots to reflect default values in input fields 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Add pre-commit check for private registry URLs in package-lock.json Same check as in CI to catch issues before push. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Increase screenshot diff tolerance to 6% for cross-platform rendering Font rendering differs between macOS and Linux, causing ~5% pixel differences. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Revert pre-commit artifactory check (moved to separate PR #133) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Format threejs server.ts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Update CONTRIBUTING.md to reflect platform-agnostic screenshots 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(e2e): use factory pattern for MCP servers to support parallel connections McpServer only supports one transport at a time. When multiple browser contexts connected in parallel, calling server.connect(transport) for each session overwrote the previous transport's callbacks, causing connection corruption and test timeouts. Changes: - server-utils.ts: accept factory function instead of single instance - All 9 example servers: wrap server creation in createServer() function - playwright.config.ts: re-enable parallel execution (4 workers) - servers.spec.ts: add toBeEnabled wait for server connection - Update screenshot baselines for basic-vanillajs and cohort-heatmap 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> --------- Co-authored-by: Claude <[email protected]>
1 parent cfd1d1d commit c9e3720

File tree

29 files changed

+3045
-1457
lines changed

29 files changed

+3045
-1457
lines changed

.github/workflows/ci.yml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@ on:
66
pull_request:
77
branches: [main]
88

9+
permissions:
10+
contents: read
11+
912
jobs:
1013
build:
1114
runs-on: ubuntu-latest
@@ -35,3 +38,32 @@ jobs:
3538
- run: npm test
3639

3740
- run: npm run prettier
41+
42+
e2e:
43+
runs-on: ubuntu-latest
44+
steps:
45+
- uses: actions/checkout@v4
46+
47+
- uses: oven-sh/setup-bun@v2
48+
with:
49+
bun-version: latest
50+
51+
- uses: actions/setup-node@v4
52+
with:
53+
node-version: "20"
54+
55+
- run: npm ci
56+
57+
- name: Install Playwright browsers
58+
run: npx playwright install --with-deps chromium
59+
60+
- name: Run E2E tests
61+
run: npx playwright test --reporter=list
62+
63+
- name: Upload test results
64+
uses: actions/upload-artifact@v4
65+
if: failure()
66+
with:
67+
name: test-results
68+
path: test-results/
69+
retention-days: 7

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,8 @@ bun.lockb
66
.vscode/
77
docs/api/
88
tmp/
9+
intermediate-findings/
10+
11+
# Playwright
12+
playwright-report/
13+
test-results/

CONTRIBUTING.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,45 @@ Or build and run examples:
4040
npm run examples:start
4141
```
4242

43+
## Testing
44+
45+
### Unit Tests
46+
47+
Run unit tests with Bun:
48+
49+
```bash
50+
npm test
51+
```
52+
53+
### E2E Tests
54+
55+
E2E tests use Playwright to verify all example servers work correctly with screenshot comparisons.
56+
57+
```bash
58+
# Run all E2E tests
59+
npm run test:e2e
60+
61+
# Run a specific server's tests
62+
npm run test:e2e -- --grep "Budget Allocator"
63+
64+
# Run tests in interactive UI mode
65+
npm run test:e2e:ui
66+
```
67+
68+
### Updating Golden Screenshots
69+
70+
When UI changes are intentional, update the golden screenshots:
71+
72+
```bash
73+
# Update all screenshots
74+
npm run test:e2e:update
75+
76+
# Update screenshots for a specific server
77+
npm run test:e2e:update -- --grep "Three.js"
78+
```
79+
80+
**Note**: Golden screenshots are platform-agnostic. Tests use canvas masking and tolerance thresholds to handle minor cross-platform rendering differences.
81+
4382
## Code of Conduct
4483

4584
This project follows our [Code of Conduct](CODE_OF_CONDUCT.md). Please review it before contributing.

examples/basic-host/src/index.tsx

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,30 @@
1+
import type { Tool } from "@modelcontextprotocol/sdk/types.js";
12
import { Component, type ErrorInfo, type ReactNode, StrictMode, Suspense, use, useEffect, useMemo, useRef, useState } from "react";
23
import { createRoot } from "react-dom/client";
34
import { callTool, connectToServer, hasAppHtml, initializeApp, loadSandboxProxy, log, newAppBridge, type ServerInfo, type ToolCallInfo } from "./implementation";
45
import styles from "./index.module.css";
56

67

8+
/**
9+
* Extract default values from a tool's JSON Schema inputSchema.
10+
* Returns a formatted JSON string with defaults, or "{}" if none found.
11+
*/
12+
function getToolDefaults(tool: Tool | undefined): string {
13+
if (!tool?.inputSchema?.properties) return "{}";
14+
15+
const defaults: Record<string, unknown> = {};
16+
for (const [key, prop] of Object.entries(tool.inputSchema.properties)) {
17+
if (prop && typeof prop === "object" && "default" in prop) {
18+
defaults[key] = prop.default;
19+
}
20+
}
21+
22+
return Object.keys(defaults).length > 0
23+
? JSON.stringify(defaults, null, 2)
24+
: "{}";
25+
}
26+
27+
728
// Host passes serversPromise to CallToolPanel
829
interface HostProps {
930
serversPromise: Promise<ServerInfo[]>;
@@ -74,6 +95,14 @@ function CallToolPanel({ serversPromise, addToolCall }: CallToolPanelProps) {
7495
setSelectedServer(server);
7596
const [firstTool] = server.tools.keys();
7697
setSelectedTool(firstTool ?? "");
98+
// Set input JSON to tool defaults (if any)
99+
setInputJson(getToolDefaults(server.tools.get(firstTool ?? "")));
100+
};
101+
102+
const handleToolSelect = (toolName: string) => {
103+
setSelectedTool(toolName);
104+
// Set input JSON to tool defaults (if any)
105+
setInputJson(getToolDefaults(selectedServer?.tools.get(toolName)));
77106
};
78107

79108
const handleSubmit = () => {
@@ -96,7 +125,7 @@ function CallToolPanel({ serversPromise, addToolCall }: CallToolPanelProps) {
96125
<select
97126
className={styles.toolSelect}
98127
value={selectedTool}
99-
onChange={(e) => setSelectedTool(e.target.value)}
128+
onChange={(e) => handleToolSelect(e.target.value)}
100129
>
101130
{selectedServer && toolNames.map((name) => (
102131
<option key={name} value={name}>{name}</option>

examples/basic-server-react/server.ts

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -7,25 +7,28 @@ import { RESOURCE_MIME_TYPE, RESOURCE_URI_META_KEY } from "../../dist/src/app";
77
import { startServer } from "../shared/server-utils.js";
88

99
const DIST_DIR = path.join(import.meta.dirname, "dist");
10+
const RESOURCE_URI = "ui://get-time/mcp-app.html";
1011

11-
const server = new McpServer({
12-
name: "Basic MCP App Server (React-based)",
13-
version: "1.0.0",
14-
});
15-
16-
// MCP Apps require two-part registration: a tool (what the LLM calls) and a
17-
// resource (the UI it renders). The `_meta` field on the tool links to the
18-
// resource URI, telling hosts which UI to display when the tool executes.
19-
{
20-
const resourceUri = "ui://get-time/mcp-app.html";
12+
/**
13+
* Creates a new MCP server instance with tools and resources registered.
14+
* Each HTTP session needs its own server instance because McpServer only supports one transport.
15+
*/
16+
function createServer(): McpServer {
17+
const server = new McpServer({
18+
name: "Basic MCP App Server (React-based)",
19+
version: "1.0.0",
20+
});
2121

22+
// MCP Apps require two-part registration: a tool (what the LLM calls) and a
23+
// resource (the UI it renders). The `_meta` field on the tool links to the
24+
// resource URI, telling hosts which UI to display when the tool executes.
2225
server.registerTool(
2326
"get-time",
2427
{
2528
title: "Get Time",
2629
description: "Returns the current server time as an ISO 8601 string.",
2730
inputSchema: {},
28-
_meta: { [RESOURCE_URI_META_KEY]: resourceUri },
31+
_meta: { [RESOURCE_URI_META_KEY]: RESOURCE_URI },
2932
},
3033
async (): Promise<CallToolResult> => {
3134
const time = new Date().toISOString();
@@ -36,8 +39,8 @@ const server = new McpServer({
3639
);
3740

3841
server.registerResource(
39-
resourceUri,
40-
resourceUri,
42+
RESOURCE_URI,
43+
RESOURCE_URI,
4144
{},
4245
async (): Promise<ReadResourceResult> => {
4346
const html = await fs.readFile(path.join(DIST_DIR, "mcp-app.html"), "utf-8");
@@ -46,19 +49,21 @@ const server = new McpServer({
4649
contents: [
4750
// Per the MCP App specification, "text/html;profile=mcp-app" signals
4851
// to the Host that this resource is indeed for an MCP App UI.
49-
{ uri: resourceUri, mimeType: RESOURCE_MIME_TYPE, text: html },
52+
{ uri: RESOURCE_URI, mimeType: RESOURCE_MIME_TYPE, text: html },
5053
],
5154
};
5255
},
5356
);
57+
58+
return server;
5459
}
5560

5661
async function main() {
5762
if (process.argv.includes("--stdio")) {
58-
await server.connect(new StdioServerTransport());
63+
await createServer().connect(new StdioServerTransport());
5964
} else {
6065
const port = parseInt(process.env.PORT ?? "3101", 10);
61-
await startServer(server, { port, name: "Basic MCP App Server (React-based)" });
66+
await startServer(createServer, { port, name: "Basic MCP App Server (React-based)" });
6267
}
6368
}
6469

examples/basic-server-vanillajs/server.ts

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -7,25 +7,28 @@ import { RESOURCE_MIME_TYPE, RESOURCE_URI_META_KEY } from "../../dist/src/app";
77
import { startServer } from "../shared/server-utils.js";
88

99
const DIST_DIR = path.join(import.meta.dirname, "dist");
10+
const RESOURCE_URI = "ui://get-time/mcp-app.html";
1011

11-
const server = new McpServer({
12-
name: "Basic MCP App Server (Vanilla JS)",
13-
version: "1.0.0",
14-
});
15-
16-
// MCP Apps require two-part registration: a tool (what the LLM calls) and a
17-
// resource (the UI it renders). The `_meta` field on the tool links to the
18-
// resource URI, telling hosts which UI to display when the tool executes.
19-
{
20-
const resourceUri = "ui://get-time/mcp-app.html";
12+
/**
13+
* Creates a new MCP server instance with tools and resources registered.
14+
* Each HTTP session needs its own server instance because McpServer only supports one transport.
15+
*/
16+
function createServer(): McpServer {
17+
const server = new McpServer({
18+
name: "Basic MCP App Server (Vanilla JS)",
19+
version: "1.0.0",
20+
});
2121

22+
// MCP Apps require two-part registration: a tool (what the LLM calls) and a
23+
// resource (the UI it renders). The `_meta` field on the tool links to the
24+
// resource URI, telling hosts which UI to display when the tool executes.
2225
server.registerTool(
2326
"get-time",
2427
{
2528
title: "Get Time",
2629
description: "Returns the current server time as an ISO 8601 string.",
2730
inputSchema: {},
28-
_meta: { [RESOURCE_URI_META_KEY]: resourceUri },
31+
_meta: { [RESOURCE_URI_META_KEY]: RESOURCE_URI },
2932
},
3033
async (): Promise<CallToolResult> => {
3134
const time = new Date().toISOString();
@@ -36,8 +39,8 @@ const server = new McpServer({
3639
);
3740

3841
server.registerResource(
39-
resourceUri,
40-
resourceUri,
42+
RESOURCE_URI,
43+
RESOURCE_URI,
4144
{},
4245
async (): Promise<ReadResourceResult> => {
4346
const html = await fs.readFile(path.join(DIST_DIR, "mcp-app.html"), "utf-8");
@@ -46,19 +49,21 @@ const server = new McpServer({
4649
contents: [
4750
// Per the MCP App specification, "text/html;profile=mcp-app" signals
4851
// to the Host that this resource is indeed for an MCP App UI.
49-
{ uri: resourceUri, mimeType: RESOURCE_MIME_TYPE, text: html },
52+
{ uri: RESOURCE_URI, mimeType: RESOURCE_MIME_TYPE, text: html },
5053
],
5154
};
5255
},
5356
);
57+
58+
return server;
5459
}
5560

5661
async function main() {
5762
if (process.argv.includes("--stdio")) {
58-
await server.connect(new StdioServerTransport());
63+
await createServer().connect(new StdioServerTransport());
5964
} else {
6065
const port = parseInt(process.env.PORT ?? "3102", 10);
61-
await startServer(server, { port, name: "Basic MCP App Server (Vanilla JS)" });
66+
await startServer(createServer, { port, name: "Basic MCP App Server (Vanilla JS)" });
6267
}
6368
}
6469

0 commit comments

Comments
 (0)