feat: LLMs.txt exports, automation, and expanded router#405
feat: LLMs.txt exports, automation, and expanded router#405gregnazario merged 6 commits intomainfrom
Conversation
- Add curated ids (llms-curated-ids), llms-small/full endpoints, html→md pipeline - Override starlight-llms-txt routes via llmsTxtIndex; trim unused plugin options - Per-page .md exports; exclude drafts; strict llms.txt section resolution - Vitest: curated id validation, dist smoke, html sanitize fidelity - Document .well-known/llms.txt in en/es/zh; agent guidelines (CLAUDE.md) - Link validator excludes llms-small/full; robots.txt LLM route comments Made-with: Cursor
- Drop src/content/docs/es/ and SpanishBetaBanner; site locales stay en/zh only - Redirect /es and /es/* to English on Vercel - Trim es from Head metadata, SearchFallback, LanguageSelect cookie logic - Update CLAUDE.md: zh translations required; no es/ pages; Spanish PDF ok on white paper page - Refresh fix-i18n-links copy and mermaid test filters Made-with: Cursor
- List aptos-spec.json and /rest-api in llms-index alongside corpus exports - Document in llms-txt and build/ai (en/zh); extend dist smoke test Made-with: Cursor
- Add Agent tooling and canonical sources block (npm MCP, Agent Skills, AI hub .md, GitHub, Explorer, AIPs, Indexer GraphQL .md) - Document in llms-txt (en/zh) and build/ai (en/zh); extend dist smoke assertions Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
Adds and curates machine-readable documentation outputs for LLMs/coding agents (llms.txt index + small/full corpora + per-page .md exports), expands the router with key API/tooling links, and updates i18n to drop Spanish docs (redirect /es → English), with new tests to keep exports stable.
Changes:
- Introduces curated
/llms.txt,/llms-small.txt,/llms-full.txtgeneration + shared rendering/sanitization utilities. - Adds per-page rendered Markdown exports via
[...slug].mdand tests/smoke checks to validate curated IDs and built outputs. - Removes Spanish-docs UI bits and updates i18n tooling/config/docs/redirects to reflect only
en+zh.
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| vercel.json | Adds /es and /es/* permanent redirects to English; adds /.well-known/llms.txt redirect. |
| tests/mermaid-rendering.test.ts | Updates Mermaid test filtering to reflect removal of Spanish docs. |
| tests/llms-html-sanitize.test.ts | Adds unit tests for HTML sanitization + HTML→Markdown conversion. |
| tests/llms-dist-smoke.test.ts | Adds post-build smoke checks for generated llms outputs and server route modules. |
| tests/llms-curated-ids.test.ts | Adds tests to ensure curated doc IDs exist, are English-only, and not draft. |
| src/starlight-overrides/PageFrame.astro | Removes Spanish beta banner injection. |
| src/starlight-overrides/LanguageSelect.astro | Updates locale cookie detection to only recognize zh vs en. |
| src/starlight-overrides/Head.astro | Removes Spanish keyword/breadcrumb entries; adds rel="llms-txt" discovery link. |
| src/pages/llms-index.ts | Replaces auto-indexing with curated section index using shared llms utilities. |
| src/pages/[...slug].md.ts | Serves per-page rendered Markdown; excludes draft pages from static paths. |
| src/lib/llms.ts | Adds shared helpers for doc filtering/ordering, rendering-to-Markdown, and cache headers. |
| src/lib/llms-html-sanitize.ts | Adds HTML stripping + Turndown conversion for Markdown export/minify. |
| src/lib/llms-curated-ids.ts | Defines curated doc ID sets and English-doc inclusion rules. |
| src/integrations/llms-txt-index.ts | Overrides starlight-llms-txt injected routes to point to local handlers. |
| src/endpoints/llms-small.txt.ts | Implements curated, minified low-token corpus export. |
| src/endpoints/llms-full.txt.ts | Implements full rendered documentation corpus export with priority ordering. |
| src/content/docs/zh/llms-txt.mdx | Updates Chinese LLMs.txt documentation to match new routing/exports. |
| src/content/docs/zh/build/ai.mdx | Updates Chinese AI tools hub with well-known redirect + API/tooling links. |
| src/content/docs/llms-txt.mdx | Updates English LLMs.txt documentation to match new routing/exports. |
| src/content/docs/build/ai.mdx | Updates English AI tools hub with well-known redirect + API/tooling links. |
| src/components/SpanishBetaBanner.astro | Removes Spanish beta banner component (deleted). |
| src/components/SearchFallback.astro | Removes Spanish search fallback mapping. |
| scripts/fix-i18n-links/src/main.rs | Updates locale-discovery messaging to reflect only zh docs tree. |
| scripts/fix-i18n-links/README.md | Updates README examples to reflect only zh localization. |
| public/robots.txt | Adds explicit allow rules for common AI crawlers; references llms-small.txt. |
| astro.config.mjs | Excludes llms-small/full from link validation; reduces starlightLlmsTxt config to minimal and documents override behavior. |
| CLAUDE.md | Updates agent guidance: only en + zh, adds LLMs routing info, and documents “no Spanish docs tree” policy. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Copilot review: replacing all \s+ collapsed newlines inside fenced code and broke Markdown structure. Collapse spaces/tabs only; extend tests. Made-with: Cursor
- Add project Cursor skill for LLM exports and SEO checklists\n- Align CLAUDE.md with agent guidelines (zh localization; no es maintenance)\n- Link skill from Resources Made-with: Cursor
|
Review follow-up (Copilot inline on `llms-html-sanitize.ts`): Already fixed on this branch in `14fe1b6` — minify uses `/[ \t]+/g` only so newlines and Markdown structure (fenced code, lists, headings) are preserved; `tests/llms-html-sanitize.test.ts` asserts the behavior. Resolved the review thread. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
Machine-readable docs for LLMs and coding agents: curated
/llms.txt,llms-small.txt,llms-full.txt, per-page.mdexports, tests, and an expanded router (OpenAPI, REST, MCP, Agent Skills, Explorer, standards, Indexer GraphQL).Key changes
llms-index,llms-small/llms-fullendpoints,[...slug].mdfor Markdown;starlight-llms-txt+llmsTxtIndexoverridesrc/lib/llms-curated-ids.ts, strictllms-indexresolution (throws on missing ids), draft pages excluded from.mdstatic pathsllms-html-sanitize.ts+ Vitest for HTML→MD sanitizationtests/llms-curated-ids.test.ts,llms-dist-smoke.test.ts(expects freshpnpm build),llms-html-sanitize.test.tsstarlightLlmsTxt, link validator excludesllms-small/llms-fulllinks;Head.astrorel=llms-txt;/.well-known/llms.txtredirect/aptos-spec.json,/rest-api, npm MCP, Agent Skills repo, AI hub.md, GitHub org, Explorer, AIPs, Indexer GraphQL.mdllms-txt,build/ai(en/zh); CLAUDE.md machine-readable section + translation policy (noes/docs; Spanish white paper PDF link allowed)es/llms-txt); Vercel/es→ English redirects; remove Spanish beta banner and related UI bitsTest plan
pnpm build&&pnpm test&&pnpm linton CIhttps://deploy-preview/llms.txtfor new sectionsMade with Cursor