-
-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Labels
bugSomething isn't workingSomething isn't working
Description
I updated the robots.txt in #334. Unfortunately, we still see a sizable number of crawlers stuck because of two issues (see also #335). One issue is that urls may contain arbitrary prefixes in their path, e.g. http://openml.org/not-really-something-we-want/d/151 will gladly redirect to the dataset page, instead of just going to a 404-page. As I understand it, this means that the crawlers will happily crawl these pages (in any case, crawlers do visit pages with prefixes that don't do anything). I am hoping/assuming that disallowing these arbitrary prefixes will significantly reduce traffic as there are fewer urls to explore.
I am also not sure why crawlers try to crawl these pages though, that's probably a separate issue to figure out.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working