Scripts to download and convert different datasets to Vowpal Wabbit format. Repository contains scripts for the following datasets:
Usage: bash ./<repo>/get_<dataset-name>.sh
xml_repo (multilabel):
- Amazon-3M
(amazon-3M) - Amazon-670K
(amazon) - AmazonCat-13K
(amazonCat) - AmazonCat-14K
(amazonCat-14K) - Bibtex
(bibtex) - Delicious
(delicious) - Delicious-200K
(deliciousLarge) - EURLex-4K
(eurlex) - Mediamill
(mediamill) - RCV1-2K
(rcv1x) - Wiki10-31K
(wiki10) - WikiLSHTC-325K
(wikiLSHTC) - Wikipedia-500K
(WikipediaLarge-500K)
- aloi.bin
- Dmoz
- imageNet
- LSHTC1
- sector
- Eur-Lex
- rcv1_regions
- bibtex
- LSHTCwiki
brew install gnu-sed