A web application which facilitates searching through the COVID-19 Kaggle Corpus. The primary purpose of this project is to provide the non-data-scientists, a way to understand the documents. However, it might also be helpful in encouraging relevant research.
- On-demand document summarization
- Keywords and Key-phrases generation
- Searching through the documents based on keywords
- node --version = v12.16.1
- npm --version = 6.14.4
- Install all the dependencies mentioned in package.json using 'npm install <DEPENDENCY>'
- All the data from Kaggle corpus (.json files) should be placed in the public/data/ directory
- The entire corpus needs to be indexed in order to support searching (index files haven't been included in this repository because of their sizes). However, they should be put in the respective folders (example: public/data/biorxiv_medrxiv/)
- Run the application using 'node app.js'. If the application fails because of insufficient Javascript Heap Memory, run the following command: node --max-old-space-size=1024 app.js
- The document summaries are generated using the textrank
- The textrank library is an implementation of this paper
- The keywords and keyphrases are generated using the retext library
- Document inverse indexes are generate using the lunr library
- The body_text and the abstract content of the desired paper is presented to this algorithm, which then generates the summaries.