Make any website AI-agent-readable. Generates /llms.txt + clean markdown for every page.
robots.txttold search engines what to crawl.llms.txttells AI agents what to read. This tool generates both automatically.
npx site-to-md https://mysite.comThat's it. Zero config. You'll get:
site-to-md-output/
├── llms.txt # Index file per llmstxt.org spec
├── llms-ctx.txt # All content inline (for single-prompt ingestion)
├── index.html.md # Homepage as markdown
├── docs/
│ ├── getting-started.html.md
│ └── api-reference.html.md
└── blog/
├── hello-world.html.md
└── release-notes.html.md
- Crawls your site — follows links or uses
sitemap.xmlif available - Extracts content — strips nav, footer, ads, scripts using Mozilla Readability (same as Firefox Reader View)
- Converts to markdown — clean, structured markdown via Turndown
- Generates
/llms.txt— per the llmstxt.org spec - Generates per-page
.html.mdfiles — per the spec convention - Generates
/llms-ctx.txt— all content inline for single-prompt ingestion
# Use directly with npx (no install needed)
npx site-to-md https://mysite.com
# Or install globally
npm install -g site-to-md
# Or as a project dependency
npm install site-to-md# Crawl a live website
site-to-md https://docs.mysite.com
# Process local build output
site-to-md ./dist
# Customize output
site-to-md https://mysite.com \
--out ./public \
--title "My Product" \
--desc "Developer documentation for My Product"
# Filter pages
site-to-md https://mysite.com \
--include "/docs/**" \
--include "/blog/**" \
--exclude "/admin/**"
# Skip context file
site-to-md https://mysite.com --no-ctx| Flag | Description | Default |
|---|---|---|
--out <dir> |
Output directory | ./site-to-md-output |
--title <name> |
Site title for llms.txt | Auto-detected |
--desc <text> |
Site description | Auto-detected |
--include <glob> |
Include only matching paths (repeatable) | All |
--exclude <glob> |
Exclude matching paths (repeatable) | None |
--no-ctx |
Skip generating llms-ctx.txt | — |
--no-sitemap |
Don't use sitemap.xml for crawling | — |
--max-depth <n> |
Max crawl depth | 3 |
--concurrency <n> |
Parallel requests | 5 |
--strip <selector> |
CSS selectors to strip (repeatable) | — |
--config <path> |
Config file path | Auto-detect |
import { agentReady } from 'site-to-md';
const result = await agentReady({
url: 'https://mysite.com',
outDir: './public',
title: 'My Product',
description: 'Developer documentation',
include: ['/docs/**'],
});
console.log(`Generated ${result.pages.length} pages`);
console.log(result.llmsTxt); // Contents of llms.txtCreate site-to-md.config.js in your project root:
export default {
url: 'https://mysite.com',
outDir: './public',
title: 'My Product',
description: 'A brief description for agents',
include: ['/docs/**', '/blog/**'],
exclude: ['/admin/**'],
sections: {
'Documentation': '/docs/**',
'Blog': '/blog/**',
'API Reference': '/api-docs/**',
},
maxDepth: 3,
concurrency: 5,
stripSelectors: ['.cookie-banner', '.ad-wrapper'],
};{
"scripts": {
"build": "next build && site-to-md ./out --out ./out"
}
}Per the llmstxt.org spec:
# My Product
> Developer documentation for building with My Product
## Documentation
- [Getting Started](/docs/getting-started.html.md): Quick start guide
- [API Reference](/docs/api-reference.html.md): Complete API docs
## Blog
- [Hello World](/blog/hello-world.html.md): Our launch announcementClean markdown extracted from each page — no nav, footer, ads, or scripts.
All page content concatenated in a single file for one-shot ingestion by AI agents.
llms.txt is a proposed standard (by Jeremy Howard) for making websites readable by AI agents. Think of it like robots.txt but for LLMs:
/llms.txt— A markdown index file listing your site's key pages with descriptions. AI agents read this first to understand what's on your site.*.html.md— Clean markdown versions of each page (same URL +.md). No nav, no footer, no JavaScript — just the content./llms-ctx.txt— All content concatenated in one file for single-prompt ingestion.
Sites like Anthropic, Cloudflare, and Stripe already have /llms.txt files. site-to-md generates yours automatically.
MIT © Stratus Labs