-
Notifications
You must be signed in to change notification settings - Fork 782
Description
I could be mistaken, but I haven't been able to find the documentation on how to verify the IA crawler. The closest thing I've found is that you can check the User-Agent string of the crawler, but that's easily faked. My issue is that I want to invite the IA crawler to crawl my content, but I want to detect things like spammers and block them.
Google and Bing both handle this by using a reverse DNS request of the IP address of the crawler, followed by a regular DNS request checking the host returned by the reverse DNS.
Put another way:
host 66.249.66.1
1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.
host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 66.249.66.1
So, since the second host command returns the same IP that we started with, and since the domain ends with googlebot.com, we're in business.
Here's google's docs: https://support.google.com/webmasters/answer/80553?hl=en
And Bings: https://www.bing.com/webmaster/help/how-to-verify-bingbot-3905dc26
Could IA add this feature too? I think it would only require that you do some work with your DNS whenever you have a new IP address.