Skip to content

Add /llm.txt endpoint for LLM consumption of MDX documentation#29

Merged
hkdeman merged 2 commits intomainfrom
claude/issue-28-20250710_132444
Jul 10, 2025
Merged

Add /llm.txt endpoint for LLM consumption of MDX documentation#29
hkdeman merged 2 commits intomainfrom
claude/issue-28-20250710_132444

Conversation

@hkdeman
Copy link
Copy Markdown
Contributor

@hkdeman hkdeman commented Jul 10, 2025

This PR adds a /llm.txt endpoint that aggregates all MDX file content for LLM consumption, as requested in #%28.

Changes

  • Added Vite plugin for build-time generation of llm.txt file
  • Added development server middleware for /llm.txt endpoint
  • Processes all 7 MDX files in /docs directory
  • Extracts frontmatter and cleans content for LLM consumption
  • Optimized for build-time generation with no dynamic loading

Technical Details

  • src/plugins/llm-txt-generator.ts - Build-time generation plugin
  • src/plugins/llm-txt-dev-server.ts - Development server plugin
  • Updated vite.config.ts to include both plugins
  • Added test scripts for validation

Fixes #%28

Generated with Claude Code

This commit introduces a new /llm.txt endpoint that aggregates all MDX file content into a single text file optimized for LLM consumption. The implementation includes:

- Build-time generation via Vite plugin for production builds
- Development server middleware for real-time content serving
- Standalone script for manual generation
- Frontmatter parsing and content cleaning
- Support for both development and production environments

The endpoint processes all MDX files in the docs directory, extracting titles and descriptions from frontmatter, cleaning JSX components, and formatting the content for optimal LLM readability.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: hkdeman <hkdeman@users.noreply.github.com>
@jazzberry-ai
Copy link
Copy Markdown

jazzberry-ai bot commented Jul 10, 2025

Bug Report

Name Severity Example test case Description
Incomplete JSX Removal Medium Create an MDX file with a custom JSX component. The regular expressions used to remove JSX components are not robust enough and leave behind artifacts like extra greater-than signs and attributes.
Incorrect Frontmatter Parsing High Create an MDX file with frontmatter and content including colon(s). The frontmatter parsing logic incorrectly includes content body in the parsed data when a colon exists in the content body.

Comments? Email us.

Comment on lines +49 to +56
let cleaned = content
// Remove JSX opening/closing tags but keep content
.replace(/<(\w+)([^>]*?)>/g, '')
.replace(/<\/\w+>/g, '')
// Remove code block language specifiers
.replace(/```(\w+)/g, '```')
// Remove HTML comments
.replace(/<!--[\s\S]*?-->/g, '')

Check failure

Code scanning / CodeQL

Incomplete multi-character sanitization High

This string may still contain
<!--
, which may cause an HTML element injection vulnerability.
Comment on lines +44 to +51
let cleaned = content
// Remove JSX opening/closing tags but keep content
.replace(/<(\w+)([^>]*?)>/g, '')
.replace(/<\/\w+>/g, '')
// Remove code block language specifiers
.replace(/```(\w+)/g, '```')
// Remove HTML comments
.replace(/<!--[\s\S]*?-->/g, '')

Check failure

Code scanning / CodeQL

Incomplete multi-character sanitization High

This string may still contain
<!--
, which may cause an HTML element injection vulnerability.

Copilot Autofix

AI 9 months ago

To address the issue, the sanitization process should be modified to repeatedly apply the HTML comment removal regular expression until no more matches are found. This ensures that any reintroduced patterns are also sanitized. This change avoids the risk of leaving residual unsafe patterns in the content.

The implementation will involve a do...while loop to repeatedly apply the replace method until the content stops changing. This approach ensures all instances of HTML comments are removed, even if they are indirectly reintroduced during the sanitization process.

Suggested changeset 1
src/plugins/llm-txt-dev-server.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/plugins/llm-txt-dev-server.ts b/src/plugins/llm-txt-dev-server.ts
--- a/src/plugins/llm-txt-dev-server.ts
+++ b/src/plugins/llm-txt-dev-server.ts
@@ -48,7 +48,15 @@
     // Remove code block language specifiers
     .replace(/```(\w+)/g, '```')
     // Remove HTML comments
-    .replace(/<!--[\s\S]*?-->/g, '')
+    // Remove HTML comments (repeatedly to handle nested or reintroduced patterns)
+    .replace(/<!--[\s\S]*?-->/g, (match, offset, string) => {
+      let result;
+      do {
+        result = string;
+        string = string.replace(/<!--[\s\S]*?-->/g, '');
+      } while (result !== string);
+      return string;
+    })
     // Remove excessive whitespace
     .replace(/\n\s*\n\s*\n/g, '\n\n')
     // Remove leading/trailing whitespace
EOF
@@ -48,7 +48,15 @@
// Remove code block language specifiers
.replace(/```(\w+)/g, '```')
// Remove HTML comments
.replace(/<!--[\s\S]*?-->/g, '')
// Remove HTML comments (repeatedly to handle nested or reintroduced patterns)
.replace(/<!--[\s\S]*?-->/g, (match, offset, string) => {
let result;
do {
result = string;
string = string.replace(/<!--[\s\S]*?-->/g, '');
} while (result !== string);
return string;
})
// Remove excessive whitespace
.replace(/\n\s*\n\s*\n/g, '\n\n')
// Remove leading/trailing whitespace
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Comment on lines +49 to +56
let cleaned = content
// Remove JSX opening/closing tags but keep content
.replace(/<(\w+)([^>]*?)>/g, '')
.replace(/<\/\w+>/g, '')
// Remove code block language specifiers
.replace(/```(\w+)/g, '```')
// Remove HTML comments
.replace(/<!--[\s\S]*?-->/g, '')

Check failure

Code scanning / CodeQL

Incomplete multi-character sanitization High

This string may still contain
<!--
, which may cause an HTML element injection vulnerability.
Comment on lines +49 to +56
let cleaned = content
// Remove JSX opening/closing tags but keep content
.replace(/<(\w+)([^>]*?)>/g, '')
.replace(/<\/\w+>/g, '')
// Remove code block language specifiers
.replace(/```(\w+)/g, '```')
// Remove HTML comments
.replace(/<!--[\s\S]*?-->/g, '')

Check failure

Code scanning / CodeQL

Incomplete multi-character sanitization High test

This string may still contain <!--, which may cause an HTML element injection vulnerability.
@jazzberry-ai
Copy link
Copy Markdown

jazzberry-ai bot commented Jul 10, 2025

Bug Report

Name Severity Example test case Description
Missing file in dory.json is silently ignored Medium Add a non-existent file path to the pages array in dory.json. Run the generate-llm-txt.js script. The generateLlmContent function in src/plugins/llm-txt-dev-server.ts and src/plugins/llm-txt-generator.ts does not handle the case where a file specified in dory.json is not found in the mdxFilesMap. This can happen due to typos or incorrect paths in dory.json. The function should either throw an error or log a warning message to indicate that a file is missing.

Comments? Email us.

@hkdeman hkdeman merged commit 51c520a into main Jul 10, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants