Skip to content

Self-hosted Markdown for AI agents — serve Markdown instead of HTML when AI agents request it

License

Notifications You must be signed in to change notification settings

JakubKontra/next-markdown-mirror

Repository files navigation

next-markdown-mirror

npm version npm downloads CI License: MIT

NPM

Self-hosted Markdown for AI agents — serve clean Markdown instead of HTML when AI agents request your pages. Full docs

The Problem

AI agents waste tokens parsing your nav bars, footers, cookie banners, and ad scripts as "content." HTML boilerplate can 2-5x the token count vs clean Markdown — and AI tools citing your site produce lower-quality responses because of the noise.

Before / After

demo

Why not Cloudflare?

Cloudflare offers automatic Markdown conversion — but it requires their Pro plan at $20/month per domain ($240/year). For 5 domains, that's $1,200/year.

next-markdown-mirror is free and open source:

next-markdown-mirror Cloudflare Pro
1 domain $0 $240/year
5 domains $0 $1,200/year
10 domains $0 $2,400/year

Plus: self-hosted (deploy anywhere), full control over filtering and frontmatter, built-in JSON-LD extraction and llms.txt support.

Quick Start

Install:

pnpm add next-markdown-mirror
# or: yarn add next-markdown-mirror
# or: npm install next-markdown-mirror
# or: bun add next-markdown-mirror

Next.js 16 setup (3 files)

1. Proxy — intercepts markdown requests and rewrites to the handler:

// proxy.ts
import { withMarkdownMirror } from 'next-markdown-mirror/nextjs';
export const proxy = withMarkdownMirror();

2. Route handler — fetches your HTML internally and converts to Markdown:

// app/md-mirror/[...path]/route.ts
import { createMarkdownHandler } from 'next-markdown-mirror/nextjs';

export const GET = createMarkdownHandler({
  baseUrl: process.env.NEXT_PUBLIC_SITE_URL!,
});

3. llms.txt — AI discovery file:

// app/llms.txt/route.ts
import { createLlmsTxtHandler } from 'next-markdown-mirror/nextjs';

export const GET = createLlmsTxtHandler({
  siteName: 'My Site',
  baseUrl: process.env.NEXT_PUBLIC_SITE_URL!,
  pages: [
    { url: '/', title: 'Home', description: 'Welcome page' },
    { url: '/about', title: 'About' },
  ],
});

How it works

                    ┌──────────────────────────────┐
                    │        Your Next.js App       │
                    │                               │
  Accept: text/md   │  ┌─────────┐                  │
  ─────────────────►│  │  proxy   │  rewrite         │
        or ?v=md    │  │  .ts     │──────────┐       │
                    │  └─────────┘           │       │
                    │                        ▼       │
                    │              ┌──────────────┐  │
                    │              │ /md-mirror/  │  │
                    │              │ [...path]    │  │  text/markdown
                    │              │  route.ts    │──┼──────────────►
                    │              └──────┬───────┘  │  + YAML frontmatter
                    │                     │          │  + token count
                    │           fetch     │          │
                    │          (internal) │          │
                    │                     ▼          │
                    │              ┌──────────────┐  │
                    │              │  Your HTML   │  │
                    │              │    page      │  │
                    │              └──────────────┘  │
                    └──────────────────────────────┘

Features

  • JSON-LD → YAML frontmatter — structured data extracted and prepended automatically
  • llms.txt protocol — built-in AI discovery file generation
  • GFM + extended Markdown — tables, task lists, definition lists, <details>, <mark>, and more
  • Token countingx-markdown-tokens response header with custom counter support
  • Intelligent content filtering — strips nav, footer, scripts, ads, cookie banners, and icons
  • Content-Signal header — tell AI agents how they may use your content

Core API

The converter works standalone without Next.js:

import { HtmlToMarkdown } from 'next-markdown-mirror';

const converter = new HtmlToMarkdown({
  baseUrl: 'https://example.com',
  extractJsonLd: true,
  contentSignal: 'ai-input',
});

const result = converter.convert(html);
// result.markdown    — converted Markdown with YAML frontmatter
// result.tokenCount  — estimated token count
// result.jsonLd      — extracted JSON-LD data
// result.title       — page title

Configuration

See the full configuration reference on the docs site.

Key options

Option Type Default Description
contentSelectors string[] ['main', 'article', '[role="main"]'] CSS selectors for main content
excludeSelectors string[] [] Additional CSS selectors to exclude
extractJsonLd boolean true Extract JSON-LD as YAML frontmatter
baseUrl string Base URL for resolving relative URLs
contentSignal ContentSignal Content-Signal header value
routePrefix string '/md-mirror' Internal route prefix (proxy config)

Contributing

git clone https://github.com/jakubkontra/next-markdown-mirror.git
cd next-markdown-mirror
npm install
npm test          # run tests
npm run typecheck # type-check
npm run lint      # lint
npm run build     # build
cd test-app && npm install && npm run dev  # run test app on :3099

Built with

Built with the help of Claude Code.

License

MIT

About

Self-hosted Markdown for AI agents — serve Markdown instead of HTML when AI agents request it

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors