Skip to content

Document errored pages and Content Signals on /crawl endpoint#29095

Merged
kathayl merged 4 commits intoproductionfrom
br-crawl-error-docs
Mar 18, 2026
Merged

Document errored pages and Content Signals on /crawl endpoint#29095
kathayl merged 4 commits intoproductionfrom
br-crawl-error-docs

Conversation

@kathayl
Copy link
Copy Markdown
Contributor

@kathayl kathayl commented Mar 18, 2026

2 goals

  1. Errored and blocked pages — Documents how HTTP errors from blocked/errored pages surface in crawl results via metadata.status and metadata.html
  2. Content Signals support — New crawlPurposes parameter in the table, dedicated "Content Signals" subsection under Crawler behavior, updated all-params example, and a troubleshooting entry for the 400 rejection
  • Add 'Errored and blocked pages' subsection explaining how HTTP errors (402, 403, etc.) are surfaced in crawl results via metadata.status and metadata.html
  • Add crawlPurposes parameter to the optional parameters table
  • Add 'Content Signals' subsection under Crawler behavior explaining the three signal categories (search, ai-input, ai-train), enforcement behavior, and how to narrow declared purposes
  • Add crawlPurposes to the all-optional-parameters example
  • Add 'Crawl rejected by Content Signals' troubleshooting entry for the 400 Bad Request error

Summary

Screenshots (optional)

Documentation checklist

  • Is there a changelog entry (guidelines)? If you don't add one for something awesome and new (however small) — how will our customers find out? Changelogs are automatically posted to RSS feeds, the Discord, and X.
  • The change adheres to the documentation style guide.
  • If a larger change - such as adding a new page- an issue has been opened in relation to any incorrect or out of date information that this PR fixes.
  • Files which have changed name or location have been allocated redirects.

- Add 'Errored and blocked pages' subsection explaining how HTTP errors (402, 403, etc.) are surfaced in crawl results via metadata.status and metadata.html
- Add crawlPurposes parameter to the optional parameters table
- Add 'Content Signals' subsection under Crawler behavior explaining the three signal categories (search, ai-input, ai-train), enforcement behavior, and how to narrow declared purposes
- Add crawlPurposes to the all-optional-parameters example
- Add 'Crawl rejected by Content Signals' troubleshooting entry for the 400 Bad Request error
@github-actions
Copy link
Copy Markdown
Contributor

This pull request requires reviews from CODEOWNERS as it changes files that match the following patterns:

Pattern Owners
/src/content/docs/browser-rendering/ @mchenco, @cloudflare/pcx-technical-writing, @celso, @kathayl, @ToriLindsay

kathayl and others added 2 commits March 18, 2026 11:45
Co-authored-by: Cameron Whiteside <35665916+CameronWhiteside@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

@kathayl kathayl merged commit 0d72d53 into production Mar 18, 2026
9 checks passed
@kathayl kathayl deleted the br-crawl-error-docs branch March 18, 2026 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants