Skip to content

Commit d4f391a

Browse files
rubenmarcusclaude
andcommitted
feat: add SEO & AEO toolkit templates
Add new "seo" category with two framework-agnostic templates: - AEO Toolkit: llms.txt, AI crawler management, structured data, citation tracking - SEO Toolkit: metadata, sitemaps, Core Web Vitals, audit dashboard Both use a monorepo architecture with adapters for Next.js, Nuxt, Astro, Remix, and Express. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent f6478a8 commit d4f391a

File tree

4 files changed

+717
-0
lines changed

4 files changed

+717
-0
lines changed

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,12 @@ Ralph reads these specs and builds the entire project autonomously.
4949
|----------|-------------|------------|
5050
| [react-native-app](specs/mobile/react-native-app.md) | Cross-platform mobile app | Intermediate |
5151

52+
### SEO & AEO
53+
| Template | Description | Difficulty |
54+
|----------|-------------|------------|
55+
| [aeo-toolkit](specs/seo/aeo-toolkit.md) | Answer Engine Optimization with llms.txt, AI crawlers, citations | Advanced |
56+
| [seo-toolkit](specs/seo/seo-toolkit.md) | Technical SEO with metadata, sitemaps, Core Web Vitals | Intermediate |
57+
5258
### Tools
5359
| Template | Description | Difficulty |
5460
|----------|-------------|------------|

specs/seo/aeo-toolkit.md

Lines changed: 337 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,337 @@
1+
# AEO Toolkit — Answer Engine Optimization
2+
3+
Build a framework-agnostic Answer Engine Optimization toolkit that makes any website fully discoverable by AI crawlers (GPTBot, ClaudeBot, PerplexityBot) and optimized for citation in AI-generated answers.
4+
5+
## Overview
6+
7+
A production-ready AEO (Answer Engine Optimization) toolkit shipped as a standalone Node.js package with framework adapters for Next.js, Nuxt, Astro, Remix, and Express. As AI-powered answer engines like ChatGPT, Perplexity, and Claude increasingly replace traditional search, websites need a new layer of optimization beyond classic SEO. This toolkit generates `llms.txt` / `llms-full.txt`, configures `robots.txt` with AI crawler directives, produces structured data (JSON-LD) tuned for answer extraction, ensures all critical content is server-rendered in HTML (AI crawlers don't execute JavaScript), and provides a monitoring dashboard to track AI bot traffic and citation performance.
8+
9+
The core library is framework-agnostic — it generates plain text, markdown, XML, and JSON-LD strings. Framework adapters wire these generators into each framework's routing and middleware system. Inspired by the `llms.txt` standard (llmstxt.org), Firecrawl's crawling patterns, and Conductor's AI crawlability research.
10+
11+
## Features
12+
13+
- `llms.txt` and `llms-full.txt` auto-generation from site content map
14+
- `robots.txt` with granular AI crawler directives (GPTBot, ChatGPT-User, ClaudeBot, Claude-Web, PerplexityBot, Google-Extended, CCBot)
15+
- AI-optimized XML sitemap with priority scoring and `lastmod` timestamps
16+
- Server-side rendered (SSR) content layer — ensures critical content is in raw HTML, not behind JavaScript
17+
- Structured data engine: FAQ, Article, HowTo, Product, Organization, and BreadcrumbList schemas (JSON-LD)
18+
- Answer-first content formatting helpers (concise answer block above the fold, then detail)
19+
- Entity consistency checker — validates that brand names, product names, and key entities are consistent across pages
20+
- Authorship and freshness signals (author schema, datePublished, dateModified)
21+
- Markdown endpoint generator — serves `.md` versions of pages for LLM consumption (per llms.txt spec)
22+
- AI bot traffic monitoring dashboard — parses server/access logs to track GPTBot, ClaudeBot, PerplexityBot visits
23+
- Citation tracker — monitors when your content appears in AI answer engine results
24+
- Core Web Vitals monitoring integration
25+
- Framework adapters: Next.js, Nuxt, Astro, Remix, Express
26+
- CLI commands for auditing, generating, and validating all AEO artifacts
27+
28+
## Tasks
29+
30+
### Task 1: Core Library Setup
31+
32+
- [ ] Initialize TypeScript monorepo with `packages/core`, `packages/cli`, and `packages/adapters/*`
33+
- [ ] Configure Turborepo or npm workspaces for package management
34+
- [ ] Set up `packages/core` with zero framework dependencies
35+
- [ ] Create `aeo.config.ts` schema with Zod (site metadata, crawler policies, content map)
36+
- [ ] Implement config loader that reads from project root
37+
- [ ] Install core dependencies (unified/remark for markdown, xml2js for XML)
38+
- [ ] Set up ESLint, Prettier, and Vitest
39+
40+
### Task 2: robots.txt Generator
41+
42+
- [ ] Create `generateRobotsTxt(config)` function in `packages/core/src/robots.ts`
43+
- [ ] Implement AI crawler directive builder with per-bot Allow/Disallow rules
44+
- [ ] Support configurable policies: allow-all, block-training-only, selective-access
45+
- [ ] Add known AI bot user-agents: GPTBot, ChatGPT-User, ClaudeBot, Claude-Web, PerplexityBot, Google-Extended, CCBot, Meta-ExternalAgent, Bytespider, Applebot-Extended
46+
- [ ] Include Sitemap directives pointing to both standard and AI sitemaps
47+
- [ ] Add Crawl-delay directives for aggressive bots
48+
- [ ] Write unit tests for all policy modes
49+
50+
### Task 3: llms.txt and llms-full.txt Generator
51+
52+
- [ ] Create `generateLlmsTxt(config)` function in `packages/core/src/llms-txt.ts`
53+
- [ ] Implement auto-generation from config site content map
54+
- [ ] Structure output per llmstxt.org spec: H1 title, blockquote summary, H2 sections with link lists
55+
- [ ] Create `generateLlmsFullTxt(config)` with comprehensive site documentation
56+
- [ ] Include company overview, product descriptions, target audience, competitive advantages
57+
- [ ] Add "Optional" section for secondary/supplementary pages
58+
- [ ] Add `lastUpdated` timestamp to both files
59+
- [ ] Write unit tests for output format validation
60+
61+
### Task 4: AI-Optimized Sitemap
62+
63+
- [ ] Create `generateSitemap(pages, config)` function in `packages/core/src/sitemap.ts`
64+
- [ ] Create `generateAISitemap(pages, config)` with AI-specific priority scoring
65+
- [ ] Implement priority calculation based on content type (landing pages > blog > archives)
66+
- [ ] Add `lastmod` timestamps from content metadata or git history
67+
- [ ] Add `changefreq` hints based on content volatility
68+
- [ ] Reference both sitemaps in robots.txt output
69+
- [ ] Write tests for XML output validity
70+
71+
### Task 5: Structured Data Engine (JSON-LD)
72+
73+
- [ ] Create `generateJsonLd(type, data)` function in `packages/core/src/structured-data.ts`
74+
- [ ] Implement Article schema (headline, author, datePublished, dateModified, publisher)
75+
- [ ] Implement FAQ schema (Question/Answer pairs for featured snippet targeting)
76+
- [ ] Implement HowTo schema (step-by-step content)
77+
- [ ] Implement Product schema (name, description, price, availability, reviews)
78+
- [ ] Implement Organization schema (name, logo, sameAs social profiles, contactPoint)
79+
- [ ] Implement BreadcrumbList schema for navigation hierarchy
80+
- [ ] Add schema validator that checks output against schema.org specs
81+
- [ ] Export framework-agnostic `<script type="application/ld+json">` string builder
82+
- [ ] Write tests for each schema type
83+
84+
### Task 6: Bot Detection and SSR Middleware
85+
86+
- [ ] Create `isAIBot(userAgent)` detector in `packages/core/src/bot-detector.ts`
87+
- [ ] Maintain user-agent pattern list for all known AI crawlers
88+
- [ ] Create `aeoMiddleware(config)` factory that returns a generic request handler
89+
- [ ] Middleware detects AI bots and sets `x-aeo-bot` header for downstream use
90+
- [ ] Implement HTML meta tag helper: `<meta name="robots" content="max-snippet:-1, max-image-preview:large">`
91+
- [ ] Add canonical URL generation utility
92+
- [ ] Write tests for bot detection accuracy
93+
94+
### Task 7: Markdown Endpoint Generator
95+
96+
- [ ] Create `htmlToMarkdown(html)` converter in `packages/core/src/markdown-endpoints.ts`
97+
- [ ] Strip navigation, footer, ads, and boilerplate — serve only core content
98+
- [ ] Add front-matter metadata (title, description, author, date) to markdown output
99+
- [ ] Create `markdownMiddleware(config)` that intercepts `.md` requests
100+
- [ ] Cache generated markdown with configurable TTL
101+
- [ ] Write tests for markdown output quality
102+
103+
### Task 8: Answer-First Content Helpers
104+
105+
- [ ] Create `formatAnswerBlock(answer, detail)` in `packages/core/src/content-helpers.ts`
106+
- [ ] Create `formatKeyFacts(facts)` — structured key-value pairs for easy extraction
107+
- [ ] Create `formatFAQ(items)` — FAQ with built-in JSON-LD generation
108+
- [ ] Implement `formatForExtraction(content)` — structures content with clear heading hierarchy
109+
- [ ] Export HTML string builders (framework-agnostic) for all helpers
110+
- [ ] Write tests for output format
111+
112+
### Task 9: Entity Consistency Checker
113+
114+
- [ ] Create entity registry schema in config (brand names, product names, key terms with canonical forms)
115+
- [ ] Build `checkEntities(files, config)` scanner in `packages/core/src/entity-checker.ts`
116+
- [ ] Flag variations (e.g., "Next.js" vs "NextJS" vs "Next JS")
117+
- [ ] Generate report with file locations and suggested fixes
118+
- [ ] Add pre-commit hook integration option
119+
- [ ] Write tests for detection accuracy
120+
121+
### Task 10: AI Bot Traffic Monitor
122+
123+
- [ ] Create log parser in `packages/core/src/bot-traffic.ts` that identifies AI bot user-agents
124+
- [ ] Support common log formats (Apache, Nginx, JSON logs)
125+
- [ ] Calculate bot visit frequency breakdown (GPTBot, ClaudeBot, PerplexityBot, etc.)
126+
- [ ] Track pages most frequently crawled by AI bots
127+
- [ ] Track crawl frequency trends over time (daily/weekly)
128+
- [ ] Add comparison: AI bot traffic vs Googlebot traffic
129+
- [ ] Implement alerting for crawl anomalies (sudden drops may indicate blocking issues)
130+
- [ ] Write tests for log parsing
131+
132+
### Task 11: Citation Tracker
133+
134+
- [ ] Create citation service in `packages/core/src/citation-tracker.ts`
135+
- [ ] Query Perplexity, ChatGPT search, and Google AI Overviews for brand mentions
136+
- [ ] Track which pages are cited and in response to which queries
137+
- [ ] Calculate "AI Share of Voice" metric
138+
- [ ] Show citation sentiment analysis (positive/neutral/negative)
139+
- [ ] Display citation trends over time
140+
- [ ] Write tests for citation parsing
141+
142+
### Task 12: Framework Adapters
143+
144+
- [ ] Create Next.js adapter in `packages/adapters/nextjs/` (route handlers for robots.txt, llms.txt, sitemaps; React components for JSON-LD; middleware)
145+
- [ ] Create Nuxt adapter in `packages/adapters/nuxt/` (server routes, Vue components, middleware)
146+
- [ ] Create Astro adapter in `packages/adapters/astro/` (API endpoints, Astro components, middleware)
147+
- [ ] Create Remix adapter in `packages/adapters/remix/` (resource routes, React components, middleware)
148+
- [ ] Create Express adapter in `packages/adapters/express/` (Express middleware and route handlers)
149+
- [ ] Each adapter wires core generators into the framework's routing system
150+
- [ ] Write integration tests for each adapter
151+
152+
### Task 13: Dashboard UI
153+
154+
- [ ] Build standalone dashboard as a single-page app in `packages/dashboard/`
155+
- [ ] Bot traffic overview page with visit frequency charts
156+
- [ ] Citation tracking page with AI Share of Voice
157+
- [ ] AEO health score page (robots.txt, llms.txt, structured data, SSR status)
158+
- [ ] Use Recharts for charts and Tailwind CSS for styling
159+
- [ ] Export as embeddable component and standalone server
160+
- [ ] Write tests for dashboard data rendering
161+
162+
### Task 14: AEO Audit CLI
163+
164+
- [ ] Create CLI entry point in `packages/cli/` with Commander.js
165+
- [ ] Implement `npx aeo audit <url>` — runs full AEO health check against any URL
166+
- [ ] Check robots.txt accessibility and AI bot directives
167+
- [ ] Validate llms.txt format against llmstxt.org spec
168+
- [ ] Verify structured data on all pages (JSON-LD validity)
169+
- [ ] Check that critical content is in static HTML (not JS-dependent)
170+
- [ ] Verify markdown endpoints are functional
171+
- [ ] Score site 0-100 with actionable recommendations
172+
- [ ] Output results as terminal table and optional JSON report
173+
- [ ] Write tests for audit logic
174+
175+
## Tech Stack
176+
177+
- TypeScript
178+
- Node.js 18+
179+
- Turborepo (monorepo management)
180+
- Zod (config validation)
181+
- Unified / Remark / Rehype (Markdown processing)
182+
- gray-matter (front-matter parsing)
183+
- xml2js (XML generation)
184+
- Commander.js (CLI)
185+
- Cheerio (HTML parsing for audits)
186+
- Recharts (dashboard charts)
187+
- Tailwind CSS (dashboard styling)
188+
- Vitest (testing)
189+
- Playwright (integration tests for SSR verification)
190+
191+
### Framework Adapters Support
192+
193+
- Next.js 14+ (App Router)
194+
- Nuxt 3+
195+
- Astro 4+
196+
- Remix 2+
197+
- Express 4+
198+
199+
## Files to Create
200+
201+
- `aeo.config.ts`
202+
- `packages/core/src/index.ts`
203+
- `packages/core/src/robots.ts`
204+
- `packages/core/src/llms-txt.ts`
205+
- `packages/core/src/sitemap.ts`
206+
- `packages/core/src/structured-data.ts`
207+
- `packages/core/src/bot-detector.ts`
208+
- `packages/core/src/markdown-endpoints.ts`
209+
- `packages/core/src/content-helpers.ts`
210+
- `packages/core/src/entity-checker.ts`
211+
- `packages/core/src/bot-traffic.ts`
212+
- `packages/core/src/citation-tracker.ts`
213+
- `packages/core/src/config.ts`
214+
- `packages/core/package.json`
215+
- `packages/cli/src/index.ts`
216+
- `packages/cli/src/audit.ts`
217+
- `packages/cli/package.json`
218+
- `packages/adapters/nextjs/src/index.ts`
219+
- `packages/adapters/nextjs/src/routes.ts`
220+
- `packages/adapters/nextjs/src/components.tsx`
221+
- `packages/adapters/nextjs/src/middleware.ts`
222+
- `packages/adapters/nextjs/package.json`
223+
- `packages/adapters/nuxt/src/index.ts`
224+
- `packages/adapters/nuxt/src/module.ts`
225+
- `packages/adapters/nuxt/package.json`
226+
- `packages/adapters/astro/src/index.ts`
227+
- `packages/adapters/astro/src/integration.ts`
228+
- `packages/adapters/astro/package.json`
229+
- `packages/adapters/remix/src/index.ts`
230+
- `packages/adapters/remix/src/routes.ts`
231+
- `packages/adapters/remix/package.json`
232+
- `packages/adapters/express/src/index.ts`
233+
- `packages/adapters/express/src/middleware.ts`
234+
- `packages/adapters/express/package.json`
235+
- `packages/dashboard/src/App.tsx`
236+
- `packages/dashboard/src/pages/traffic.tsx`
237+
- `packages/dashboard/src/pages/citations.tsx`
238+
- `packages/dashboard/src/pages/health.tsx`
239+
- `packages/dashboard/package.json`
240+
- `tests/core/robots.test.ts`
241+
- `tests/core/llms-txt.test.ts`
242+
- `tests/core/sitemap.test.ts`
243+
- `tests/core/structured-data.test.ts`
244+
- `tests/core/entity-checker.test.ts`
245+
- `tests/core/markdown-endpoints.test.ts`
246+
- `tests/core/bot-detector.test.ts`
247+
- `tests/cli/audit.test.ts`
248+
- `tests/adapters/nextjs.test.ts`
249+
- `tests/adapters/express.test.ts`
250+
- `turbo.json`
251+
- `package.json`
252+
- `.env.example`
253+
254+
## Configuration
255+
256+
### Environment Variables
257+
258+
- `SITE_URL` — Canonical site URL (e.g., `https://example.com`)
259+
- `SITE_NAME` — Brand name used in structured data and llms.txt
260+
- `AEO_LOG_PATH` — Path to server access logs for bot traffic parsing
261+
- `PERPLEXITY_API_KEY` — (Optional) For citation tracking via Perplexity API
262+
- `OPENAI_API_KEY` — (Optional) For citation tracking via ChatGPT search
263+
264+
### aeo.config.ts Example
265+
266+
```typescript
267+
import { defineAEOConfig } from '@aeo-toolkit/core';
268+
269+
export default defineAEOConfig({
270+
site: {
271+
name: 'My Company',
272+
url: 'https://example.com',
273+
description: 'Short description for llms.txt blockquote',
274+
},
275+
robots: {
276+
policy: 'allow-all', // 'allow-all' | 'block-training' | 'selective'
277+
customRules: [
278+
{ userAgent: 'GPTBot', disallow: ['/admin/', '/api/'] },
279+
],
280+
},
281+
llmsTxt: {
282+
sections: [
283+
{ title: 'Documentation', pages: ['/docs', '/guides', '/api-reference'] },
284+
{ title: 'Blog', pages: ['/blog'] },
285+
],
286+
optional: ['/changelog', '/careers'],
287+
},
288+
entities: {
289+
'Next.js': ['NextJS', 'Next JS', 'Nextjs'],
290+
'TypeScript': ['Typescript', 'TS'],
291+
},
292+
sitemap: {
293+
priorities: {
294+
'/': 1.0,
295+
'/products/*': 0.9,
296+
'/blog/*': 0.7,
297+
'/docs/*': 0.8,
298+
},
299+
},
300+
});
301+
```
302+
303+
### Framework Integration Examples
304+
305+
```typescript
306+
// Next.js — app/robots.txt/route.ts
307+
import { createRobotsHandler } from '@aeo-toolkit/nextjs';
308+
export const GET = createRobotsHandler();
309+
310+
// Express
311+
import { aeoMiddleware } from '@aeo-toolkit/express';
312+
app.use(aeoMiddleware()); // serves /robots.txt, /llms.txt, /sitemap.xml
313+
314+
// Astro — src/pages/robots.txt.ts
315+
import { createRobotsEndpoint } from '@aeo-toolkit/astro';
316+
export const GET = createRobotsEndpoint();
317+
```
318+
319+
## Usage
320+
321+
1. Install: `npm install @aeo-toolkit/core @aeo-toolkit/<framework>`
322+
2. Create `aeo.config.ts` in your project root
323+
3. Wire framework adapter into your routes (see examples above)
324+
4. Run `npx aeo audit https://yoursite.com` for a full AEO health check
325+
5. Visit `/admin/aeo` for the monitoring dashboard (optional)
326+
327+
## Notes
328+
329+
- AI crawlers (GPTBot, ClaudeBot, PerplexityBot) do NOT execute JavaScript — all critical content must be in the initial HTML response
330+
- `llms.txt` is an emerging standard; no major AI company has formally adopted it yet, but implementing it is low-risk and future-proofs your site
331+
- `robots.txt` directives remain the primary and most reliable mechanism for AI crawler control
332+
- ChatGPT crawls ~8x more frequently than Googlebot; Perplexity ~3x more frequently (Conductor research)
333+
- First impression matters: AI bots lack manual reindexing — content must be correct on first crawl
334+
- The entity consistency checker helps avoid confusing AI models with inconsistent naming
335+
- Citation tracking requires API access to answer engines and may have rate limits
336+
- The core package has zero framework dependencies — adapters are optional
337+
- Requires Node.js 18+

0 commit comments

Comments
 (0)