Releases: tomverran/robots
Releases · tomverran/robots
Fix order of implode arguments
Fix notice being emitted with malformed files
Also drops support for very old PHP versions
Skip records with an empty user agent
With thanks to David Goodwin
Fix case sensitivity
Merge pull request #15 from nickmoline/master URL paths in robots.txt are case sensitive
1.13
1.12
1.11
Match user agent according to google specs
https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt#order-of-precedence-for-user-agents
1.10
1.10-beta
from PR #9
This update brings the library into line with the following two specs for Robots files:
- http://www.robotstxt.org/norobots-rfc.txt
- https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt?hl=en
A summary of the changes
- The most specific matching User-agent is found in the file to determine the rules to apply
- I took most specific to mean longest match, which is how allow / disallow rules are ordered.
- Wildcards expand across directory boundaries and can be anchored to the end of the string with $
- Encoded HTML characters are supported, with the exception of encoded slashes which are left as-is
The file is no longer modelled as a tree but as a series of Records, with User-Agent and AccessRules objects. The actual file parser uses arrays as an intermediate representation to cut down on the number of objects created.
I will mark this release as production ready after a couple of days of testing in the wild.
Version 1.02
Relative to 1.02-beta
- Adds isDisallowed (thanks @waknauss-kingdom)
Relative to 1.01
- Fixes multiple user agents
- Fixes non lowercase user agents