-
-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Labels
bugSomething isn't workingSomething isn't working
Description
I updated the robots.txt in #334. Unfortunately, we still see a sizable number of crawlers stuck because of two issues (see also #336). One issue is that most pages allow for filters (and sorting), and this means there are (near) limitless urls to crawl. We should disallow them in our robots.txt. We perhaps should not do this right away though, as currently also the entity pages (e.g., the dataset pages https://www.openml.org/search?type=data&sort=runs&id=151&status=active) contain filters/sorts. I do think we want crawlers to visit the dataset pages. So we must first create entity pages with urls which do not contain queries. Then we can disallow crawling of the remaining pages that do support queries.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working