Skip to content

Conversation

@evantahler
Copy link
Contributor

@evantahler evantahler commented Nov 4, 2025

Rather than wait until the end of the week, we can run the generation of LLMs.txt on every PR, looking only for what has changed. Changes will be auto-commited back to your PR


Note

Runs LLMs.txt generation on pull requests with auto-commit and updates the generator to perform incremental, git-aware regeneration with embedded metadata; refreshes llms.txt.

  • CI/CD (GitHub Actions)
    • Trigger workflow on pull_request (opened/synchronize/reopened).
    • Checkout PR head ref with custom token.
    • Detect file changes; if changes in PR runs, auto-commit public/llms.txt back to the PR branch.
    • For manual/scheduled runs, create a PR and enable automerge.
  • Generator Script (scripts/generate-llmstxt.ts)
    • Add incremental regeneration using previous llms.txt metadata and git diff to only summarize changed/new pages; remove deleted pages implicitly.
    • Embed metadata comment (git-sha, generation-date) in output; preserve if no changes.
    • Reuse existing summaries; batch summarize changed pages; refactor into helper functions.
    • Minor: add constants/regexes, structured logging, and sectioned output generation.
  • Artifact (public/llms.txt)
    • Regenerated with new metadata header and refreshed link summaries/sections.

Written by Cursor Bugbot for commit 04fbfae. This will update automatically on new commits. Configure here.

@vercel
Copy link

vercel bot commented Nov 4, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
docs Ready Ready Preview, Comment Dec 19, 2025 4:55pm

@evantahler evantahler force-pushed the llms-txt-since-last-time branch from 7b5af51 to c960b1b Compare November 4, 2025 21:50
@evantahler evantahler marked this pull request as ready for review November 4, 2025 21:55
@evantahler
Copy link
Contributor Author

@sdserranog and @torresmateo what do you think about this pattern?

@torresmateo
Copy link
Collaborator

I like this pattern! I like opened and reopened for sure, but I'm not so sure about synchronize, as it may become too verbose when there's a lot of back and forth in a PR. But I'm willing to try!

@evantahler
Copy link
Contributor Author

I like this pattern! I like opened and reopened for sure, but I'm not so sure about synchronize, as it may become too verbose when there's a lot of back and forth in a PR. But I'm willing to try!

Yeah, but what I was worried about was the page content changing, and the summary being inaccurate.

Copy link
Collaborator

@torresmateo torresmateo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

)
);
return new Set();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Error handler claims to process all files but doesn't

When getChangedFilesSince() fails (e.g., the previous SHA was rebased away), it logs "processing all files" but returns an empty Set. This causes changedFiles.has(page.path) in determinePagesToSummarize() to return false for all pages. Combined with the condition if (isChanged || !existingSummary), pages with existing summaries will be kept unchanged rather than re-processed. The actual behavior is the opposite of what the log message states - modified files silently retain stale summaries instead of being re-summarized.

Fix in Cursor Fix in Web

title: existingSummary.title,
description: existingSummary.description,
});
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Duplicate entries when source files share same URL

The incremental update logic iterates over discovered pages by file path but looks up existing summaries by URL. When two source files map to the same URL (e.g., how-arcade-helps.mdx and how-arcade-helps/page.mdx), both pages pass through the loop independently. If one file is changed and another isn't, one ends up in pagesToSummarize and the other in pagesToKeep, resulting in duplicate entries in the final output. The generated llms.txt shows this happening with the "How Arcade helps with Agent Authorization" page appearing twice.

Fix in Cursor Fix in Web

@torresmateo torresmateo merged commit a0e4bf3 into main Dec 19, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants