Skip to content

Latest commit

 

History

History
18 lines (10 loc) · 971 Bytes

File metadata and controls

18 lines (10 loc) · 971 Bytes

Content Engineering Tutorial :: Data

The base data for this project is available at from the NIPS Dataset Repository on Kaggle contributed by Ben Hamner. It contains the titles, authors, abstracts and extracted text for all NIPS papers from 1987 to 2017. You will need the following files.

  • authors.csv
  • paper_authors.csv
  • papers.csv

We also need the following dictionary files for extracting ORGs from our text

All other files can be derived (sometimes at significant expense of processing time) from these 3 files.

In addition, since the aim of this project is to create data that makes the search experience better, you also need a search engine. This tutorial requires a Solr server to be installed and available.