Skip to content

Filter directory pages from llms.txt#29200

Open
mvvmm wants to merge 1 commit intoproductionfrom
remove-directory-pages-from-llms-txt
Open

Filter directory pages from llms.txt#29200
mvvmm wants to merge 1 commit intoproductionfrom
remove-directory-pages-from-llms-txt

Conversation

@mvvmm
Copy link
Contributor

@mvvmm mvvmm commented Mar 20, 2026

Summary

Per-product llms.txt files previously included section index pages whose only content is a <DirectoryListing /> component — pages that exist purely as sidebar navigation containers. These pages have no standalone value for an LLM (the llms.txt file itself already serves as a directory), and they were failing the content-start-position audit check because 100% of their rendered content is navigation links.

This adds an isDirectoryOnlyPage() helper that strips imports, component tags, and JSX comments from a page's raw MDX body, then returns true if the page contains a <DirectoryListing /> with ≤ 250 characters of surrounding prose. Any page matching that condition is excluded from the per-product llms.txt filter in getStaticPaths.

  • ~415 pure directory index pages are removed from per-product llms.txt outputs
  • Pages with real prose alongside <DirectoryListing /> (>250 chars) are preserved
  • Pages that don't use <DirectoryListing /> at all are unaffected
  • No build time impact — operates on e.body already loaded in memory by getCollection
  • The global /llms.txt (product directory) is unchanged

@github-actions
Copy link
Contributor

This pull request requires reviews from CODEOWNERS as it changes files that match the following patterns:

Pattern Owners
*.ts @cloudflare/content-engineering, @kodster28

@mvvmm
Copy link
Contributor Author

mvvmm commented Mar 20, 2026

/bonk please review

Copy link
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The implementation is clean and well-documented. The 250-character threshold is a reasonable heuristic for distinguishing navigation-only pages from content pages. All checks pass.

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Mar 20, 2026

Approved PR #29200. The changes are clean, well-documented, and pass all validation checks.

github run

@mvvmm mvvmm marked this pull request as ready for review March 20, 2026 21:55
@mvvmm mvvmm requested review from a team and kodster28 as code owners March 20, 2026 21:55
@github-actions
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants