Skip to content

Support * and $ wildcards in robots.txt#656

Merged
ato merged 1 commit intomasterfrom
robotstxt-wildcard
Jun 9, 2025
Merged

Support * and $ wildcards in robots.txt#656
ato merged 1 commit intomasterfrom
robotstxt-wildcard

Conversation

@ato
Copy link
Copy Markdown
Collaborator

@ato ato commented Jun 7, 2025

This adds support for robots.txt rules containing the * and $ wildcards from RFC 9309.

Classic prefix rules without any wildcard continue to use a NavigableSet for fast lookups but for now we just loop over the wildcard rules.

We're still not fully compliant with RFC 9309 as we don't follow the percent encoding rules. I'm currently a little uncertain what the correct behavior is for those as I'm confused by the wording of them and the other implementations I looked at don't seem to align with the examples in the RFC.

Fixes #250

@ato ato force-pushed the robotstxt-wildcard branch from 1644d05 to c7b7ee1 Compare June 7, 2025 01:37
@ato ato merged commit 7e15d2f into master Jun 9, 2025
7 checks passed
@ato ato deleted the robotstxt-wildcard branch June 9, 2025 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support full wildcard syntax in robots.txt directives

1 participant