A Python-based tool that automatically searches for research papers related to a given query using Semantic Scholar, then generates a concise literature review by summarizing each paper's abstract using state-of-the-art NLP models (Pegasus or Gemini).
- 🔍 Search relevant papers using Semantic Scholar's API.
- 🧾 Automatically summarize each abstract using:
Pegasus(local, transformer-based)- or Google's
GeminiAPI (if enabled)
- 📚 Generates a general literature summary with references to each paper.
- 🧪 Supports a debug mode to work offline using pre-downloaded mock papers.
.
├── main.py # Entry point of the app
├── semantic_scholar.py # Contains `search_papers()` logic
├── literature_summary.py # Summarization utilities
├── json_raw_data/
│ └── human genome sequencing variation.json.json # Mock paper data
# Clone the repo
git clone https://github.com/SujalRajpt/ResearchPaperCrawler.git
cd ResearchPaperCrawler
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtSome key packages include:
transformerstorchrequeststqdm(optional for progress)semantic-scholar(or your wrapper for querying papers)
You can generate a requirements.txt using:
pip freeze > requirements.txtpython main.pyEdit the main.py:
DEBUG_MODE = FalseThen provide a search query inside main.py or modify it to accept CLI input.
=== Literature Review Summary ===
Paper A shows how genome sequencing improves diagnosis [1].
Another study highlights challenges in capturing variation [2].
...
References:
[1] Title of Paper A
[2] Title of Paper B
...
To use Gemini for summarization, update:
USE_GEMINI_API = TrueAnd insert your API key into summarize_abstract_gemini():
api_key = "YOUR_API_KEY"- Pegasus may truncate long abstracts, so keep summaries concise.
- Gemini is useful for faster or higher-quality summaries, but requires a valid API key and internet access.
- Abstracts with job-related content (e.g., "apply", "email") are filtered out to avoid noise.
MIT License. See LICENSE for details.