You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This feature allows converting any page on the Sentry documentation site to a plain markdown format by simply appending `llms.txt` to the end of any URL. This is designed to make the documentation more accessible to Large Language Models (LLMs) and other automated tools that work better with plain text markdown content.
5
+
This feature allows converting any page on the Sentry documentation site to a plain markdown format by simply appending `llms.txt` to the end of any URL. The feature extracts the actual page content from the source MDX files and converts it to clean markdown, making the documentation more accessible to Large Language Models (LLMs) and other automated tools.
6
6
7
7
## How It Works
8
8
9
-
The feature is implemented using Next.js middleware that intercepts requests ending with `llms.txt` and converts the corresponding page content to markdown format.
9
+
The feature is implemented using Next.js middleware that intercepts requests ending with `llms.txt` and rewrites them to an API route that extracts and converts the actual page content to markdown format.
10
10
11
11
### Implementation Details
12
12
13
13
1.**Middleware Interception**: The middleware in `src/middleware.ts` detects URLs ending with `llms.txt`
14
-
2.**Path Processing**: The middleware strips the `llms.txt` suffix to get the original page path
15
-
3.**Content Generation**: A comprehensive markdown representation is generated based on the page type and content
16
-
4.**Response**: The markdown content is returned as plain text with appropriate headers
14
+
2.**Request Rewriting**: The middleware rewrites the request to `/api/llms-txt` with the original path as a parameter
15
+
3.**Content Extraction**: The API route extracts the actual MDX content from source files
16
+
4.**Markdown Conversion**: JSX components and imports are stripped to create clean markdown
17
+
5.**Response**: The full page content is returned as plain text with appropriate headers
17
18
18
19
### File Changes
19
20
20
-
-`src/middleware.ts`: Added `handleLlmsTxt` function and URL detection logic
21
+
-`src/middleware.ts`: Added `handleLlmsTxt` function with URL detection and rewriting logic
22
+
-`app/api/llms-txt/route.ts`: New API route that handles content extraction and conversion
21
23
22
24
## Usage Examples
23
25
@@ -37,47 +39,63 @@ The feature is implemented using Next.js middleware that intercepts requests end
37
39
- Original URL: `https://docs.sentry.io/product/performance/`
5.**Recursive Processing**: Full page content extraction and processing
202
+
1.**Enhanced JSX Cleanup**: More sophisticated removal of React components
203
+
2.**Code Block Preservation**: Better handling of code examples
204
+
3.**Link Resolution**: Convert relative links to absolute URLs
205
+
4.**Image Handling**: Process and reference images appropriately
206
+
5.**Table of Contents**: Generate TOC from headings
207
+
6.**Metadata Extraction**: Include more frontmatter data in output
153
208
154
209
## Maintenance
155
210
156
-
- The feature is self-contained in the middleware
157
-
- Content templates can be updated in the `handleLlmsTxt` function
211
+
- The feature is self-contained with clear separation of concerns
212
+
- Content extraction logic can be enhanced in the API route
213
+
- Cleanup patterns can be updated in the `cleanupMarkdown()` function
158
214
- Performance can be monitored through response times and caching metrics
159
-
- Error handling is built-in with fallback responses
215
+
- Error handling provides clear debugging information
160
216
161
217
---
162
218
163
-
**Note**: This is a simplified implementation that provides structured markdown summaries. For complete content access, users should visit the original documentation pages.
219
+
**Note**: This feature extracts the actual page content from source MDX files and converts it to clean markdown format, making it ideal for LLM consumption and automated processing.
0 commit comments