-
Notifications
You must be signed in to change notification settings - Fork 22
test(docs): add comprehensive tests and monitoring for llms.txt endpoints #4676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ints - Add unit tests for middleware rewrites (llms.txt, llms-full.txt, .md) - Add unit tests for markdown route slug handling - Create synthetic monitoring script with incident.io integration - Add GitHub Actions cron workflow for 5-minute health checks - Configure biome.json to allow console statements in monitoring scripts The monitoring script checks all configured sites and: - Sends Slack alerts on failures - Creates incidents in incident.io when endpoints are down - Auto-resolves incidents when endpoints recover Co-Authored-By: [email protected] <[email protected]>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
The middleware tests were failing because Next.js middleware uses server-only modules that cannot be imported in test environments. Since the middleware routing logic was already fixed and verified in PR #4675, we don't need complex unit tests for it. Keeping the markdown route tests and monitoring script which provide value without fighting Next.js server constraints. Co-Authored-By: [email protected] <[email protected]>
Co-authored-by: vercel[bot] <35613825+vercel[bot]@users.noreply.github.com>
packages/fern-docs/bundle/src/app/[host]/[domain]/api/fern-docs/markdown/route.test.ts
Show resolved
Hide resolved
- Change INCIDENT_IO_API_KEY to INCIDENT_API_KEY to match site-check.yml convention - Remove redundant 'Report status' step that doesn't work across GitHub Actions steps - Add pull_request trigger for PR branch to enable CI testing before merge - Update error messages in monitoring script to reference correct env var name Co-Authored-By: [email protected] <[email protected]>
Addresses Vercel AI comment - the workflow was setting SLACK_WEBHOOK_URL but the monitoring script expects SLACK_WEBHOOK_URL_DOCS_INCIDENTS Co-Authored-By: David Konigsberg <[email protected]>
|
Fixed in commit 0bda778. Changed the environment variable name from |
Per David's request, removing the push trigger on the PR branch now that testing is complete. The workflow will now only run on schedule (every 5 minutes) and manual dispatch. Co-Authored-By: David Konigsberg <[email protected]>
Addresses Vercel AI comments 21 and 22. Added AbortController with 10-second timeout to: - llms.txt fetch in checkSite (line 189) - Slack webhook fetch in sendSlackAlert - All incident.io API fetch calls (findOpenIncident, updateIncident, createIncident, resolveIncident) This ensures the monitoring script never hangs indefinitely if external services don't respond. Co-Authored-By: David Konigsberg <[email protected]>
|
Fixed in commit 4466868. Added AbortController with 10-second timeout to all fetch calls in the monitoring script:
This ensures the monitoring script never hangs indefinitely if external services don't respond. |
Add production monitoring for llms.txt endpoints with incident.io integration
This PR adds synthetic monitoring for llms.txt, llms-full.txt, and .md/.mdx endpoints used by LLM tools for documentation discovery, plus fixes a critical bug in the markdown route that was preventing proper slug parameter handling from middleware.
What was the motivation & context behind this PR?
After fixing the routing bug in PR #4675, we needed production monitoring to detect when these critical endpoints go down. During development, the monitoring script was failing with 404 errors on .mdx endpoints that worked fine in browsers, which led to discovering both a URL construction bug in the monitoring script and a critical bug in the markdown route itself.
Additionally, David raised concerns about duplicate Slack notifications (incident.io already posts to Slack) and Vercel AI identified a missing response validation in the monitoring script.
Changes Made
1. Markdown Route Bug Fix (CRITICAL)
File:
packages/fern-docs/bundle/src/app/[host]/[domain]/api/fern-docs/markdown/route.tsThe route now checks for the
slugsearch parameter (passed by middleware) before falling back to pathname extraction:This is backward-compatible but critical for middleware integration with llms.txt endpoints.
2. Synthetic Monitoring Script
File:
scripts/monitor/check-llms-md-endpoints.ts(504 lines)Comprehensive monitoring script with:
ENABLE_SLACK_ALERTS=1to avoid duplicate notifications with incident.ioSLACK_WEBHOOK_URL_DOCS_INCIDENTSenv var[TEST]prefix andmode: "test"field, with separate idempotency keysnew URL(pathOrUrl, base)to prevent path duplication bug3. GitHub Actions Workflow
File:
.github/workflows/monitor-llms-md-endpoints.ymlSLACK_WEBHOOK_URL_DOCS_INCIDENTSandINCIDENT_API_KEY4. Unit Tests
File:
packages/fern-docs/bundle/src/app/[host]/[domain]/api/fern-docs/markdown/route.test.ts5 test cases covering slug parameter handling in the markdown route.
How has this PR been tested?
[TEST]prefix andmode: "test"fieldHuman Review Checklist
Critical items requiring verification:
INCIDENT_STATUS_MONITORING = "01HR85VFNXWH1H6976YCEJ5XJB"INCIDENT_STATUS_CANCELED = "01HR85VFNXMV8SBQ3FRPMDBCST"(used for resolution instead of CLOSED)INCIDENT_STATUS_CLOSED = "01HR85VFNXJPF6TXWYTXA6NBS2"(defined but not used)INCIDENT_SEVERITY_MINOR = "01HR85VFNX9NYZG6B5Z40K8Y9V"ENABLE_SLACK_ALERTS=1) to avoid duplicate notifications with incident.io. This means:ENABLE_SLACK_ALERTS=1is setMonitored sites list (lines 41-46): Currently monitors 5 sites. Please confirm this list is complete.
URL construction logic (lines 88-89): Uses
new URL(pathOrUrl, base)to handle both absolute and relative URLs. Verify this correctly handles all edge cases.push.branchesfor testing. This MUST be removed before merging (as noted in PR comment fix: clamp prose width to 600px and improve list rendering #11).Secret configuration: Verify
SLACK_WEBHOOK_URL_DOCS_INCIDENTSandINCIDENT_API_KEYare configured in GitHub repo settings.TEST_MODE vs DRY_RUN: Two different modes with different purposes:
DRY_RUN=1: Skips all mutations (no incidents created/updated/resolved)TEST_MODE=1: Creates real incidents but labels them as tests with[TEST]prefix andmode: "test"Link to Devin run: https://app.devin.ai/sessions/3d070b89262340449cde1aa04e516963
Requested by: David Konigsberg ([email protected]) (@davidkonigsberg)