An AI system that reads a research paper and writes a structured peer review — then grades its own work and gets better over time.
You give it a PDF → It writes a review → It checks the review → It learns from mistakes
Step by step:
- Reads the paper — extracts title, abstract, methods, experiments, claims, and contributions
- Finds related work — searches Semantic Scholar for similar papers
- Checks novelty — compares the paper's claims against prior work
- Writes a review — generates a structured NeurIPS/ICML-style review with scores
- Grades itself — a second AI pass checks the review for hallucinations, missing content, and unsupported claims
- Improves — saves quality scores and automatically adjusts settings on the next run
# 1. Install dependencies
pip install -r requirements.txt
# 2. Add your API key
cp .env.example .env
# Edit .env and set: ANTHROPIC_API_KEY=sk-ant-...
# 3. Run the web app
streamlit run app.pyOpen http://localhost:8501 in your browser.
Upload a PDF → choose a venue → click ▶ Run Review
| Tab | What you see |
|---|---|
| 📄 Review | The full peer review with score, strengths, weaknesses, and questions |
| 🧑⚖️ Evaluation | Quality scores — was the review grounded? did it hallucinate? |
| 📚 Related Papers | Papers found on Semantic Scholar + novelty assessment |
| 📊 Metrics Dashboard | History of all past runs and performance trends |
# Basic review
python main.py paper.pdf
# Specify venue
python main.py paper.pdf --venue ICML
# Add custom focus
python main.py paper.pdf --criteria "Focus on fairness and reproducibility"
# Save full output as JSON
python main.py paper.pdf --output review.json
# View performance history
python main.py --statsAfter each review, quality scores are saved locally. Before the next run, the system reads those scores and adjusts automatically:
| Problem detected | What changes |
|---|---|
| Review makes things up | Switches to a stricter prompt |
| Review misses paper content | Fetches more related papers, uses a more thorough prompt |
| Review quality is low | Runs a second self-critique pass |
NeurIPS · ICML · ICLR · ACL · CVPR · Generic
├── app.py ← Streamlit web interface
├── main.py ← CLI interface
├── src/
│ ├── extractor.py ← Reads PDF, extracts paper structure
│ ├── retrieval.py ← Searches Semantic Scholar, checks novelty
│ ├── reviewer.py ← Generates the review
│ ├── evaluator.py ← Grades the review quality
│ ├── monitor.py ← Saves and displays metrics history
│ ├── adaptive.py ← Adjusts settings based on past performance
│ └── schemas.py ← Data models
├── data/metrics/ ← Review history (auto-created)
└── requirements.txt
- Python 3.10+
- Anthropic API key (get one here)
- Optional: Semantic Scholar API key for higher rate limits