|
| 1 | +# AEO Toolkit — Answer Engine Optimization |
| 2 | + |
| 3 | +Build a framework-agnostic Answer Engine Optimization toolkit that makes any website fully discoverable by AI crawlers (GPTBot, ClaudeBot, PerplexityBot) and optimized for citation in AI-generated answers. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +A production-ready AEO (Answer Engine Optimization) toolkit shipped as a standalone Node.js package with framework adapters for Next.js, Nuxt, Astro, Remix, and Express. As AI-powered answer engines like ChatGPT, Perplexity, and Claude increasingly replace traditional search, websites need a new layer of optimization beyond classic SEO. This toolkit generates `llms.txt` / `llms-full.txt`, configures `robots.txt` with AI crawler directives, produces structured data (JSON-LD) tuned for answer extraction, ensures all critical content is server-rendered in HTML (AI crawlers don't execute JavaScript), and provides a monitoring dashboard to track AI bot traffic and citation performance. |
| 8 | + |
| 9 | +The core library is framework-agnostic — it generates plain text, markdown, XML, and JSON-LD strings. Framework adapters wire these generators into each framework's routing and middleware system. Inspired by the `llms.txt` standard (llmstxt.org), Firecrawl's crawling patterns, and Conductor's AI crawlability research. |
| 10 | + |
| 11 | +## Features |
| 12 | + |
| 13 | +- `llms.txt` and `llms-full.txt` auto-generation from site content map |
| 14 | +- `robots.txt` with granular AI crawler directives (GPTBot, ChatGPT-User, ClaudeBot, Claude-Web, PerplexityBot, Google-Extended, CCBot) |
| 15 | +- AI-optimized XML sitemap with priority scoring and `lastmod` timestamps |
| 16 | +- Server-side rendered (SSR) content layer — ensures critical content is in raw HTML, not behind JavaScript |
| 17 | +- Structured data engine: FAQ, Article, HowTo, Product, Organization, and BreadcrumbList schemas (JSON-LD) |
| 18 | +- Answer-first content formatting helpers (concise answer block above the fold, then detail) |
| 19 | +- Entity consistency checker — validates that brand names, product names, and key entities are consistent across pages |
| 20 | +- Authorship and freshness signals (author schema, datePublished, dateModified) |
| 21 | +- Markdown endpoint generator — serves `.md` versions of pages for LLM consumption (per llms.txt spec) |
| 22 | +- AI bot traffic monitoring dashboard — parses server/access logs to track GPTBot, ClaudeBot, PerplexityBot visits |
| 23 | +- Citation tracker — monitors when your content appears in AI answer engine results |
| 24 | +- Core Web Vitals monitoring integration |
| 25 | +- Framework adapters: Next.js, Nuxt, Astro, Remix, Express |
| 26 | +- CLI commands for auditing, generating, and validating all AEO artifacts |
| 27 | + |
| 28 | +## Tasks |
| 29 | + |
| 30 | +### Task 1: Core Library Setup |
| 31 | + |
| 32 | +- [ ] Initialize TypeScript monorepo with `packages/core`, `packages/cli`, and `packages/adapters/*` |
| 33 | +- [ ] Configure Turborepo or npm workspaces for package management |
| 34 | +- [ ] Set up `packages/core` with zero framework dependencies |
| 35 | +- [ ] Create `aeo.config.ts` schema with Zod (site metadata, crawler policies, content map) |
| 36 | +- [ ] Implement config loader that reads from project root |
| 37 | +- [ ] Install core dependencies (unified/remark for markdown, xml2js for XML) |
| 38 | +- [ ] Set up ESLint, Prettier, and Vitest |
| 39 | + |
| 40 | +### Task 2: robots.txt Generator |
| 41 | + |
| 42 | +- [ ] Create `generateRobotsTxt(config)` function in `packages/core/src/robots.ts` |
| 43 | +- [ ] Implement AI crawler directive builder with per-bot Allow/Disallow rules |
| 44 | +- [ ] Support configurable policies: allow-all, block-training-only, selective-access |
| 45 | +- [ ] Add known AI bot user-agents: GPTBot, ChatGPT-User, ClaudeBot, Claude-Web, PerplexityBot, Google-Extended, CCBot, Meta-ExternalAgent, Bytespider, Applebot-Extended |
| 46 | +- [ ] Include Sitemap directives pointing to both standard and AI sitemaps |
| 47 | +- [ ] Add Crawl-delay directives for aggressive bots |
| 48 | +- [ ] Write unit tests for all policy modes |
| 49 | + |
| 50 | +### Task 3: llms.txt and llms-full.txt Generator |
| 51 | + |
| 52 | +- [ ] Create `generateLlmsTxt(config)` function in `packages/core/src/llms-txt.ts` |
| 53 | +- [ ] Implement auto-generation from config site content map |
| 54 | +- [ ] Structure output per llmstxt.org spec: H1 title, blockquote summary, H2 sections with link lists |
| 55 | +- [ ] Create `generateLlmsFullTxt(config)` with comprehensive site documentation |
| 56 | +- [ ] Include company overview, product descriptions, target audience, competitive advantages |
| 57 | +- [ ] Add "Optional" section for secondary/supplementary pages |
| 58 | +- [ ] Add `lastUpdated` timestamp to both files |
| 59 | +- [ ] Write unit tests for output format validation |
| 60 | + |
| 61 | +### Task 4: AI-Optimized Sitemap |
| 62 | + |
| 63 | +- [ ] Create `generateSitemap(pages, config)` function in `packages/core/src/sitemap.ts` |
| 64 | +- [ ] Create `generateAISitemap(pages, config)` with AI-specific priority scoring |
| 65 | +- [ ] Implement priority calculation based on content type (landing pages > blog > archives) |
| 66 | +- [ ] Add `lastmod` timestamps from content metadata or git history |
| 67 | +- [ ] Add `changefreq` hints based on content volatility |
| 68 | +- [ ] Reference both sitemaps in robots.txt output |
| 69 | +- [ ] Write tests for XML output validity |
| 70 | + |
| 71 | +### Task 5: Structured Data Engine (JSON-LD) |
| 72 | + |
| 73 | +- [ ] Create `generateJsonLd(type, data)` function in `packages/core/src/structured-data.ts` |
| 74 | +- [ ] Implement Article schema (headline, author, datePublished, dateModified, publisher) |
| 75 | +- [ ] Implement FAQ schema (Question/Answer pairs for featured snippet targeting) |
| 76 | +- [ ] Implement HowTo schema (step-by-step content) |
| 77 | +- [ ] Implement Product schema (name, description, price, availability, reviews) |
| 78 | +- [ ] Implement Organization schema (name, logo, sameAs social profiles, contactPoint) |
| 79 | +- [ ] Implement BreadcrumbList schema for navigation hierarchy |
| 80 | +- [ ] Add schema validator that checks output against schema.org specs |
| 81 | +- [ ] Export framework-agnostic `<script type="application/ld+json">` string builder |
| 82 | +- [ ] Write tests for each schema type |
| 83 | + |
| 84 | +### Task 6: Bot Detection and SSR Middleware |
| 85 | + |
| 86 | +- [ ] Create `isAIBot(userAgent)` detector in `packages/core/src/bot-detector.ts` |
| 87 | +- [ ] Maintain user-agent pattern list for all known AI crawlers |
| 88 | +- [ ] Create `aeoMiddleware(config)` factory that returns a generic request handler |
| 89 | +- [ ] Middleware detects AI bots and sets `x-aeo-bot` header for downstream use |
| 90 | +- [ ] Implement HTML meta tag helper: `<meta name="robots" content="max-snippet:-1, max-image-preview:large">` |
| 91 | +- [ ] Add canonical URL generation utility |
| 92 | +- [ ] Write tests for bot detection accuracy |
| 93 | + |
| 94 | +### Task 7: Markdown Endpoint Generator |
| 95 | + |
| 96 | +- [ ] Create `htmlToMarkdown(html)` converter in `packages/core/src/markdown-endpoints.ts` |
| 97 | +- [ ] Strip navigation, footer, ads, and boilerplate — serve only core content |
| 98 | +- [ ] Add front-matter metadata (title, description, author, date) to markdown output |
| 99 | +- [ ] Create `markdownMiddleware(config)` that intercepts `.md` requests |
| 100 | +- [ ] Cache generated markdown with configurable TTL |
| 101 | +- [ ] Write tests for markdown output quality |
| 102 | + |
| 103 | +### Task 8: Answer-First Content Helpers |
| 104 | + |
| 105 | +- [ ] Create `formatAnswerBlock(answer, detail)` in `packages/core/src/content-helpers.ts` |
| 106 | +- [ ] Create `formatKeyFacts(facts)` — structured key-value pairs for easy extraction |
| 107 | +- [ ] Create `formatFAQ(items)` — FAQ with built-in JSON-LD generation |
| 108 | +- [ ] Implement `formatForExtraction(content)` — structures content with clear heading hierarchy |
| 109 | +- [ ] Export HTML string builders (framework-agnostic) for all helpers |
| 110 | +- [ ] Write tests for output format |
| 111 | + |
| 112 | +### Task 9: Entity Consistency Checker |
| 113 | + |
| 114 | +- [ ] Create entity registry schema in config (brand names, product names, key terms with canonical forms) |
| 115 | +- [ ] Build `checkEntities(files, config)` scanner in `packages/core/src/entity-checker.ts` |
| 116 | +- [ ] Flag variations (e.g., "Next.js" vs "NextJS" vs "Next JS") |
| 117 | +- [ ] Generate report with file locations and suggested fixes |
| 118 | +- [ ] Add pre-commit hook integration option |
| 119 | +- [ ] Write tests for detection accuracy |
| 120 | + |
| 121 | +### Task 10: AI Bot Traffic Monitor |
| 122 | + |
| 123 | +- [ ] Create log parser in `packages/core/src/bot-traffic.ts` that identifies AI bot user-agents |
| 124 | +- [ ] Support common log formats (Apache, Nginx, JSON logs) |
| 125 | +- [ ] Calculate bot visit frequency breakdown (GPTBot, ClaudeBot, PerplexityBot, etc.) |
| 126 | +- [ ] Track pages most frequently crawled by AI bots |
| 127 | +- [ ] Track crawl frequency trends over time (daily/weekly) |
| 128 | +- [ ] Add comparison: AI bot traffic vs Googlebot traffic |
| 129 | +- [ ] Implement alerting for crawl anomalies (sudden drops may indicate blocking issues) |
| 130 | +- [ ] Write tests for log parsing |
| 131 | + |
| 132 | +### Task 11: Citation Tracker |
| 133 | + |
| 134 | +- [ ] Create citation service in `packages/core/src/citation-tracker.ts` |
| 135 | +- [ ] Query Perplexity, ChatGPT search, and Google AI Overviews for brand mentions |
| 136 | +- [ ] Track which pages are cited and in response to which queries |
| 137 | +- [ ] Calculate "AI Share of Voice" metric |
| 138 | +- [ ] Show citation sentiment analysis (positive/neutral/negative) |
| 139 | +- [ ] Display citation trends over time |
| 140 | +- [ ] Write tests for citation parsing |
| 141 | + |
| 142 | +### Task 12: Framework Adapters |
| 143 | + |
| 144 | +- [ ] Create Next.js adapter in `packages/adapters/nextjs/` (route handlers for robots.txt, llms.txt, sitemaps; React components for JSON-LD; middleware) |
| 145 | +- [ ] Create Nuxt adapter in `packages/adapters/nuxt/` (server routes, Vue components, middleware) |
| 146 | +- [ ] Create Astro adapter in `packages/adapters/astro/` (API endpoints, Astro components, middleware) |
| 147 | +- [ ] Create Remix adapter in `packages/adapters/remix/` (resource routes, React components, middleware) |
| 148 | +- [ ] Create Express adapter in `packages/adapters/express/` (Express middleware and route handlers) |
| 149 | +- [ ] Each adapter wires core generators into the framework's routing system |
| 150 | +- [ ] Write integration tests for each adapter |
| 151 | + |
| 152 | +### Task 13: Dashboard UI |
| 153 | + |
| 154 | +- [ ] Build standalone dashboard as a single-page app in `packages/dashboard/` |
| 155 | +- [ ] Bot traffic overview page with visit frequency charts |
| 156 | +- [ ] Citation tracking page with AI Share of Voice |
| 157 | +- [ ] AEO health score page (robots.txt, llms.txt, structured data, SSR status) |
| 158 | +- [ ] Use Recharts for charts and Tailwind CSS for styling |
| 159 | +- [ ] Export as embeddable component and standalone server |
| 160 | +- [ ] Write tests for dashboard data rendering |
| 161 | + |
| 162 | +### Task 14: AEO Audit CLI |
| 163 | + |
| 164 | +- [ ] Create CLI entry point in `packages/cli/` with Commander.js |
| 165 | +- [ ] Implement `npx aeo audit <url>` — runs full AEO health check against any URL |
| 166 | +- [ ] Check robots.txt accessibility and AI bot directives |
| 167 | +- [ ] Validate llms.txt format against llmstxt.org spec |
| 168 | +- [ ] Verify structured data on all pages (JSON-LD validity) |
| 169 | +- [ ] Check that critical content is in static HTML (not JS-dependent) |
| 170 | +- [ ] Verify markdown endpoints are functional |
| 171 | +- [ ] Score site 0-100 with actionable recommendations |
| 172 | +- [ ] Output results as terminal table and optional JSON report |
| 173 | +- [ ] Write tests for audit logic |
| 174 | + |
| 175 | +## Tech Stack |
| 176 | + |
| 177 | +- TypeScript |
| 178 | +- Node.js 18+ |
| 179 | +- Turborepo (monorepo management) |
| 180 | +- Zod (config validation) |
| 181 | +- Unified / Remark / Rehype (Markdown processing) |
| 182 | +- gray-matter (front-matter parsing) |
| 183 | +- xml2js (XML generation) |
| 184 | +- Commander.js (CLI) |
| 185 | +- Cheerio (HTML parsing for audits) |
| 186 | +- Recharts (dashboard charts) |
| 187 | +- Tailwind CSS (dashboard styling) |
| 188 | +- Vitest (testing) |
| 189 | +- Playwright (integration tests for SSR verification) |
| 190 | + |
| 191 | +### Framework Adapters Support |
| 192 | + |
| 193 | +- Next.js 14+ (App Router) |
| 194 | +- Nuxt 3+ |
| 195 | +- Astro 4+ |
| 196 | +- Remix 2+ |
| 197 | +- Express 4+ |
| 198 | + |
| 199 | +## Files to Create |
| 200 | + |
| 201 | +- `aeo.config.ts` |
| 202 | +- `packages/core/src/index.ts` |
| 203 | +- `packages/core/src/robots.ts` |
| 204 | +- `packages/core/src/llms-txt.ts` |
| 205 | +- `packages/core/src/sitemap.ts` |
| 206 | +- `packages/core/src/structured-data.ts` |
| 207 | +- `packages/core/src/bot-detector.ts` |
| 208 | +- `packages/core/src/markdown-endpoints.ts` |
| 209 | +- `packages/core/src/content-helpers.ts` |
| 210 | +- `packages/core/src/entity-checker.ts` |
| 211 | +- `packages/core/src/bot-traffic.ts` |
| 212 | +- `packages/core/src/citation-tracker.ts` |
| 213 | +- `packages/core/src/config.ts` |
| 214 | +- `packages/core/package.json` |
| 215 | +- `packages/cli/src/index.ts` |
| 216 | +- `packages/cli/src/audit.ts` |
| 217 | +- `packages/cli/package.json` |
| 218 | +- `packages/adapters/nextjs/src/index.ts` |
| 219 | +- `packages/adapters/nextjs/src/routes.ts` |
| 220 | +- `packages/adapters/nextjs/src/components.tsx` |
| 221 | +- `packages/adapters/nextjs/src/middleware.ts` |
| 222 | +- `packages/adapters/nextjs/package.json` |
| 223 | +- `packages/adapters/nuxt/src/index.ts` |
| 224 | +- `packages/adapters/nuxt/src/module.ts` |
| 225 | +- `packages/adapters/nuxt/package.json` |
| 226 | +- `packages/adapters/astro/src/index.ts` |
| 227 | +- `packages/adapters/astro/src/integration.ts` |
| 228 | +- `packages/adapters/astro/package.json` |
| 229 | +- `packages/adapters/remix/src/index.ts` |
| 230 | +- `packages/adapters/remix/src/routes.ts` |
| 231 | +- `packages/adapters/remix/package.json` |
| 232 | +- `packages/adapters/express/src/index.ts` |
| 233 | +- `packages/adapters/express/src/middleware.ts` |
| 234 | +- `packages/adapters/express/package.json` |
| 235 | +- `packages/dashboard/src/App.tsx` |
| 236 | +- `packages/dashboard/src/pages/traffic.tsx` |
| 237 | +- `packages/dashboard/src/pages/citations.tsx` |
| 238 | +- `packages/dashboard/src/pages/health.tsx` |
| 239 | +- `packages/dashboard/package.json` |
| 240 | +- `tests/core/robots.test.ts` |
| 241 | +- `tests/core/llms-txt.test.ts` |
| 242 | +- `tests/core/sitemap.test.ts` |
| 243 | +- `tests/core/structured-data.test.ts` |
| 244 | +- `tests/core/entity-checker.test.ts` |
| 245 | +- `tests/core/markdown-endpoints.test.ts` |
| 246 | +- `tests/core/bot-detector.test.ts` |
| 247 | +- `tests/cli/audit.test.ts` |
| 248 | +- `tests/adapters/nextjs.test.ts` |
| 249 | +- `tests/adapters/express.test.ts` |
| 250 | +- `turbo.json` |
| 251 | +- `package.json` |
| 252 | +- `.env.example` |
| 253 | + |
| 254 | +## Configuration |
| 255 | + |
| 256 | +### Environment Variables |
| 257 | + |
| 258 | +- `SITE_URL` — Canonical site URL (e.g., `https://example.com`) |
| 259 | +- `SITE_NAME` — Brand name used in structured data and llms.txt |
| 260 | +- `AEO_LOG_PATH` — Path to server access logs for bot traffic parsing |
| 261 | +- `PERPLEXITY_API_KEY` — (Optional) For citation tracking via Perplexity API |
| 262 | +- `OPENAI_API_KEY` — (Optional) For citation tracking via ChatGPT search |
| 263 | + |
| 264 | +### aeo.config.ts Example |
| 265 | + |
| 266 | +```typescript |
| 267 | +import { defineAEOConfig } from '@aeo-toolkit/core'; |
| 268 | + |
| 269 | +export default defineAEOConfig({ |
| 270 | + site: { |
| 271 | + name: 'My Company', |
| 272 | + url: 'https://example.com', |
| 273 | + description: 'Short description for llms.txt blockquote', |
| 274 | + }, |
| 275 | + robots: { |
| 276 | + policy: 'allow-all', // 'allow-all' | 'block-training' | 'selective' |
| 277 | + customRules: [ |
| 278 | + { userAgent: 'GPTBot', disallow: ['/admin/', '/api/'] }, |
| 279 | + ], |
| 280 | + }, |
| 281 | + llmsTxt: { |
| 282 | + sections: [ |
| 283 | + { title: 'Documentation', pages: ['/docs', '/guides', '/api-reference'] }, |
| 284 | + { title: 'Blog', pages: ['/blog'] }, |
| 285 | + ], |
| 286 | + optional: ['/changelog', '/careers'], |
| 287 | + }, |
| 288 | + entities: { |
| 289 | + 'Next.js': ['NextJS', 'Next JS', 'Nextjs'], |
| 290 | + 'TypeScript': ['Typescript', 'TS'], |
| 291 | + }, |
| 292 | + sitemap: { |
| 293 | + priorities: { |
| 294 | + '/': 1.0, |
| 295 | + '/products/*': 0.9, |
| 296 | + '/blog/*': 0.7, |
| 297 | + '/docs/*': 0.8, |
| 298 | + }, |
| 299 | + }, |
| 300 | +}); |
| 301 | +``` |
| 302 | + |
| 303 | +### Framework Integration Examples |
| 304 | + |
| 305 | +```typescript |
| 306 | +// Next.js — app/robots.txt/route.ts |
| 307 | +import { createRobotsHandler } from '@aeo-toolkit/nextjs'; |
| 308 | +export const GET = createRobotsHandler(); |
| 309 | + |
| 310 | +// Express |
| 311 | +import { aeoMiddleware } from '@aeo-toolkit/express'; |
| 312 | +app.use(aeoMiddleware()); // serves /robots.txt, /llms.txt, /sitemap.xml |
| 313 | + |
| 314 | +// Astro — src/pages/robots.txt.ts |
| 315 | +import { createRobotsEndpoint } from '@aeo-toolkit/astro'; |
| 316 | +export const GET = createRobotsEndpoint(); |
| 317 | +``` |
| 318 | + |
| 319 | +## Usage |
| 320 | + |
| 321 | +1. Install: `npm install @aeo-toolkit/core @aeo-toolkit/<framework>` |
| 322 | +2. Create `aeo.config.ts` in your project root |
| 323 | +3. Wire framework adapter into your routes (see examples above) |
| 324 | +4. Run `npx aeo audit https://yoursite.com` for a full AEO health check |
| 325 | +5. Visit `/admin/aeo` for the monitoring dashboard (optional) |
| 326 | + |
| 327 | +## Notes |
| 328 | + |
| 329 | +- AI crawlers (GPTBot, ClaudeBot, PerplexityBot) do NOT execute JavaScript — all critical content must be in the initial HTML response |
| 330 | +- `llms.txt` is an emerging standard; no major AI company has formally adopted it yet, but implementing it is low-risk and future-proofs your site |
| 331 | +- `robots.txt` directives remain the primary and most reliable mechanism for AI crawler control |
| 332 | +- ChatGPT crawls ~8x more frequently than Googlebot; Perplexity ~3x more frequently (Conductor research) |
| 333 | +- First impression matters: AI bots lack manual reindexing — content must be correct on first crawl |
| 334 | +- The entity consistency checker helps avoid confusing AI models with inconsistent naming |
| 335 | +- Citation tracking requires API access to answer engines and may have rate limits |
| 336 | +- The core package has zero framework dependencies — adapters are optional |
| 337 | +- Requires Node.js 18+ |
0 commit comments