-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat(ai): Add .md extension to provide pages in markdown for LLMs #13994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 36 commits
Commits
Show all changes
41 commits
Select commit
Hold shift + click to select a range
f3d38be
Add llms.txt middleware for generating markdown documentation summaries
cursoragent cf617cc
Implement LLMs.txt feature with API route for markdown content extrac…
cursoragent 81d3771
Refactor llms.txt feature with dynamic path routing and improved cont…
cursoragent 0b58824
Enhance LLMs.txt feature with advanced JSX processing and content ext…
cursoragent 68d66bb
Add platform-specific code snippets to LLMs.txt feature
cursoragent 9c1c749
Running autofix
codyde 755dc72
Bump to next 15.2.3
codyde 3a38ff3
Correcting linting issues
codyde b279e2f
Removing log statement from middleware
codyde 233937f
Correcting linting errors
codyde 27df3f8
Refactor match handling in resolvePlatformIncludes function
codyde 3181a0d
[getsentry/action-github-commit] Auto commit
getsantry[bot] 77445d3
Correcing param ordering based on app router
codyde 0d7d5c4
Update parameter naming in GET function for consistency
codyde 8a5474e
Revert "Correcting linting issues"
codyde 2661a20
Moving LLM generation functionality out of middleware and into nextjs…
codyde 4eb6d3f
Implement Markdown Export Feature: Add API route for .md exports and …
codyde 75e1d84
Enhance Markdown Export Feature: Implement static file generation at …
codyde e0aa2b0
let's go
BYK fadcf3d
esm stuff
BYK af75f4c
moar esm stuff
BYK f7c8e6e
all back to cjs
BYK bf42d67
revert stuff
BYK ce6ae41
revert tsconfig too
BYK 97533a9
hack tsc
BYK 5a4f3e5
cleaner md
BYK 305267c
remove debug thing
BYK f832981
fix root detection
BYK a822450
even more clean up
BYK d564bfd
Merge branch 'master' into cursor/convert-page-to-markdown-format-39de
BYK 693c4f4
remove LLM instructions file
BYK 4536db8
parallelize
BYK 88a59b5
fix typo
BYK 435606b
bump min workers
BYK ad400e2
add source for vercel build cpus
BYK 33121cd
back to 2 max workers
BYK 21830c9
Update middleware.ts
codyde 31c3ab9
add Markdown links
BYK 19ee67f
nofollow on md
BYK 6d687ef
revert useless NODE_ENV check
BYK 29eda98
revert middleware.ts changes
BYK File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,153 @@ | ||
| #!/usr/bin/env node | ||
|
|
||
| import {fileURLToPath} from 'url'; | ||
|
|
||
| import {selectAll} from 'hast-util-select'; | ||
| import {existsSync} from 'node:fs'; | ||
| import {mkdir, opendir, readFile, rm, writeFile} from 'node:fs/promises'; | ||
| import {cpus} from 'node:os'; | ||
| import * as path from 'node:path'; | ||
| import {isMainThread, parentPort, Worker, workerData} from 'node:worker_threads'; | ||
| import rehypeParse from 'rehype-parse'; | ||
| import rehypeRemark from 'rehype-remark'; | ||
| import remarkGfm from 'remark-gfm'; | ||
| import remarkStringify from 'remark-stringify'; | ||
| import {unified} from 'unified'; | ||
| import {remove} from 'unist-util-remove'; | ||
|
|
||
| async function createWork() { | ||
| let root = process.cwd(); | ||
| while (!existsSync(path.join(root, 'package.json'))) { | ||
| const parent = path.dirname(root); | ||
| if (parent === root) { | ||
| throw new Error('Could not find package.json in parent directories'); | ||
| } | ||
| root = parent; | ||
| } | ||
| const INPUT_DIR = path.join(root, '.next', 'server', 'app'); | ||
| const OUTPUT_DIR = path.join(root, 'public', 'md-exports'); | ||
|
|
||
| console.log(`🚀 Starting markdown generation from: ${INPUT_DIR}`); | ||
| console.log(`📁 Output directory: ${OUTPUT_DIR}`); | ||
|
|
||
| // Clear output directory | ||
| await rm(OUTPUT_DIR, {recursive: true, force: true}); | ||
| await mkdir(OUTPUT_DIR, {recursive: true}); | ||
|
|
||
| // On a 16-core machine, 8 workers were optimal (and slightly faster than 16) | ||
| // Putting 4 as the minimum as Vercel has 4 cores per builder and it may help | ||
| // us cut down some of the time there. | ||
| // Source: https://vercel.com/docs/limits#build-container-resources | ||
| const numWorkers = Math.max(Math.floor(cpus().length / 2), 2); | ||
| const workerTasks = new Array(numWorkers).fill(null).map(() => []); | ||
|
|
||
| console.log(`🔎 Discovering files to convert...`); | ||
|
|
||
| let numFiles = 0; | ||
| let workerIdx = 0; | ||
| // Need a high buffer size here otherwise Node skips some subdirectories! | ||
| // See https://github.com/nodejs/node/issues/48820 | ||
| const dir = await opendir(INPUT_DIR, {recursive: true, bufferSize: 1024}); | ||
| for await (const dirent of dir) { | ||
| if (dirent.name.endsWith('.html') && dirent.isFile()) { | ||
| const sourcePath = path.join(dirent.parentPath || dirent.path, dirent.name); | ||
| const targetDir = path.join( | ||
| OUTPUT_DIR, | ||
| path.relative(INPUT_DIR, dirent.parentPath || dirent.path) | ||
| ); | ||
| await mkdir(targetDir, {recursive: true}); | ||
| const targetPath = path.join(targetDir, dirent.name.slice(0, -5) + '.md'); | ||
| workerTasks[workerIdx].push({sourcePath, targetPath}); | ||
| workerIdx = (workerIdx + 1) % numWorkers; | ||
| numFiles++; | ||
| } | ||
| } | ||
|
|
||
| console.log(`📄 Converting ${numFiles} files with ${numWorkers} workers...`); | ||
|
|
||
| const selfPath = fileURLToPath(import.meta.url); | ||
| const workerPromises = new Array(numWorkers - 1).fill(null).map((_, idx) => { | ||
| return new Promise((resolve, reject) => { | ||
| const worker = new Worker(selfPath, {workerData: workerTasks[idx]}); | ||
| let hasErrors = false; | ||
| worker.on('message', data => { | ||
| if (data.failedTasks.length === 0) { | ||
| console.log(`✅ Worker[${idx}]: ${data.success} files successfully.`); | ||
| } else { | ||
| hasErrors = true; | ||
| console.error(`❌ Worker[${idx}]: ${data.failedTasks.length} files failed:`); | ||
| console.error(data.failedTasks); | ||
| } | ||
| }); | ||
| worker.on('error', reject); | ||
| worker.on('exit', code => { | ||
| if (code !== 0) { | ||
| reject(new Error(`Worker[${idx}] stopped with exit code ${code}`)); | ||
| } else { | ||
| hasErrors ? reject(new Error(`Worker[${idx}] had some errors.`)) : resolve(); | ||
| } | ||
| }); | ||
| }); | ||
| }); | ||
| // The main thread can also process tasks -- That's 65% more bullet per bullet! -Cave Johnson | ||
| workerPromises.push(processTaskList(workerTasks[workerTasks.length - 1])); | ||
|
|
||
| await Promise.all(workerPromises); | ||
|
|
||
| console.log(`📄 Generated ${numFiles} markdown files from HTML.`); | ||
| console.log('✅ Markdown export generation complete!'); | ||
| } | ||
|
|
||
| async function genMDFromHTML(source, target) { | ||
| const text = await readFile(source, {encoding: 'utf8'}); | ||
| await writeFile( | ||
| target, | ||
| String( | ||
| await unified() | ||
| .use(rehypeParse) | ||
| // Need the `main div > hgroup` selector for the headers | ||
| .use(() => tree => selectAll('main div > hgroup, div#main', tree)) | ||
| // If we don't do this wrapping, rehypeRemark just returns an empty string -- yeah WTF? | ||
| .use(() => tree => ({ | ||
| type: 'element', | ||
| tagName: 'div', | ||
| properties: {}, | ||
| children: tree, | ||
| })) | ||
| .use(rehypeRemark, { | ||
| document: false, | ||
| handlers: { | ||
| // Remove buttons as they usually get confusing in markdown, especially since we use them as tab headers | ||
| button() {}, | ||
| }, | ||
| }) | ||
| // We end up with empty inline code blocks, probably from some tab logic in the HTML, remove them | ||
| .use(() => tree => remove(tree, {type: 'inlineCode', value: ''})) | ||
| .use(remarkGfm) | ||
| .use(remarkStringify) | ||
| .process(text) | ||
| ) | ||
| ); | ||
| } | ||
|
|
||
| async function processTaskList(tasks) { | ||
| const failedTasks = []; | ||
| for (const {sourcePath, targetPath} of tasks) { | ||
| try { | ||
| await genMDFromHTML(sourcePath, targetPath); | ||
| } catch (error) { | ||
| failedTasks.push({sourcePath, targetPath, error}); | ||
| } | ||
| } | ||
| return {success: tasks.length - failedTasks.length, failedTasks}; | ||
| } | ||
|
|
||
| async function doWork(tasks) { | ||
| parentPort.postMessage(await processTaskList(tasks)); | ||
| } | ||
|
|
||
| if (isMainThread) { | ||
| await createWork(); | ||
| } else { | ||
| await doWork(workerData); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m: Is this one intended? This condition will always fail, even when NODE_ENV is
productionThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intended as otherwise
yarn buildfails locally, asking for these DSNs. That said I think the negation at the front is incorrect. I wonder how this worked locally.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay reverted this as
next buildsetsNODE_ENVtoproductionby default if it was not set 😮Added a
yarn start:devcommand instead to do what I needed.