Add /llm.txt endpoint for LLM consumption of MDX documentation#29
Add /llm.txt endpoint for LLM consumption of MDX documentation#29
Conversation
This commit introduces a new /llm.txt endpoint that aggregates all MDX file content into a single text file optimized for LLM consumption. The implementation includes: - Build-time generation via Vite plugin for production builds - Development server middleware for real-time content serving - Standalone script for manual generation - Frontmatter parsing and content cleaning - Support for both development and production environments The endpoint processes all MDX files in the docs directory, extracting titles and descriptions from frontmatter, cleaning JSX components, and formatting the content for optimal LLM readability. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: hkdeman <hkdeman@users.noreply.github.com>
Bug Report
Comments? Email us. |
| let cleaned = content | ||
| // Remove JSX opening/closing tags but keep content | ||
| .replace(/<(\w+)([^>]*?)>/g, '') | ||
| .replace(/<\/\w+>/g, '') | ||
| // Remove code block language specifiers | ||
| .replace(/```(\w+)/g, '```') | ||
| // Remove HTML comments | ||
| .replace(/<!--[\s\S]*?-->/g, '') |
Check failure
Code scanning / CodeQL
Incomplete multi-character sanitization High
| let cleaned = content | ||
| // Remove JSX opening/closing tags but keep content | ||
| .replace(/<(\w+)([^>]*?)>/g, '') | ||
| .replace(/<\/\w+>/g, '') | ||
| // Remove code block language specifiers | ||
| .replace(/```(\w+)/g, '```') | ||
| // Remove HTML comments | ||
| .replace(/<!--[\s\S]*?-->/g, '') |
Check failure
Code scanning / CodeQL
Incomplete multi-character sanitization High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 9 months ago
To address the issue, the sanitization process should be modified to repeatedly apply the HTML comment removal regular expression until no more matches are found. This ensures that any reintroduced patterns are also sanitized. This change avoids the risk of leaving residual unsafe patterns in the content.
The implementation will involve a do...while loop to repeatedly apply the replace method until the content stops changing. This approach ensures all instances of HTML comments are removed, even if they are indirectly reintroduced during the sanitization process.
| @@ -48,7 +48,15 @@ | ||
| // Remove code block language specifiers | ||
| .replace(/```(\w+)/g, '```') | ||
| // Remove HTML comments | ||
| .replace(/<!--[\s\S]*?-->/g, '') | ||
| // Remove HTML comments (repeatedly to handle nested or reintroduced patterns) | ||
| .replace(/<!--[\s\S]*?-->/g, (match, offset, string) => { | ||
| let result; | ||
| do { | ||
| result = string; | ||
| string = string.replace(/<!--[\s\S]*?-->/g, ''); | ||
| } while (result !== string); | ||
| return string; | ||
| }) | ||
| // Remove excessive whitespace | ||
| .replace(/\n\s*\n\s*\n/g, '\n\n') | ||
| // Remove leading/trailing whitespace |
| let cleaned = content | ||
| // Remove JSX opening/closing tags but keep content | ||
| .replace(/<(\w+)([^>]*?)>/g, '') | ||
| .replace(/<\/\w+>/g, '') | ||
| // Remove code block language specifiers | ||
| .replace(/```(\w+)/g, '```') | ||
| // Remove HTML comments | ||
| .replace(/<!--[\s\S]*?-->/g, '') |
Check failure
Code scanning / CodeQL
Incomplete multi-character sanitization High
test-llm-generation.js
Outdated
| let cleaned = content | ||
| // Remove JSX opening/closing tags but keep content | ||
| .replace(/<(\w+)([^>]*?)>/g, '') | ||
| .replace(/<\/\w+>/g, '') | ||
| // Remove code block language specifiers | ||
| .replace(/```(\w+)/g, '```') | ||
| // Remove HTML comments | ||
| .replace(/<!--[\s\S]*?-->/g, '') |
Check failure
Code scanning / CodeQL
Incomplete multi-character sanitization High test
Bug Report
Comments? Email us. |
This PR adds a /llm.txt endpoint that aggregates all MDX file content for LLM consumption, as requested in #%28.
Changes
Technical Details
src/plugins/llm-txt-generator.ts- Build-time generation pluginsrc/plugins/llm-txt-dev-server.ts- Development server pluginvite.config.tsto include both pluginsFixes #%28
Generated with Claude Code