Automatically generate scientific papers from unstructured research notes.
A reproducible research pipeline that converts deep research notes into structured scientific papers with proper formatting, citations, and analysis.
Generate a complete scientific paper in seconds:
# Install dependencies
pip install -r requirements.txt
# Run the paper generation pipeline
python scripts/catecholamine_cli.py full --simpleThat's it! Your paper will be at paper/_output/generated_paper.html
python demo.pyTransforms unstructured research notes (like deepresearch.md) into:
- ✅ Structured scientific paper with proper sections
- ✅ Abstract synthesizing key findings
- ✅ Methods section with quality criteria
- ✅ Results organized by cognitive domain
- ✅ Discussion with clinical implications
- ✅ Effect size extraction from research notes
- ✅ HTML/PDF output ready for publication
Example: A 50+ page deepresearch.md becomes a publication-ready scientific review in minutes.
- Automatic Structure: Converts freeform notes into Introduction, Methods, Results, Discussion, Conclusions
- Smart Parsing: Extracts effect sizes, citations, and quantitative findings automatically
- Multiple Output Formats: HTML (no dependencies), PDF via Quarto
- Citation Management: Automatic BibTeX generation from references
- Validate and normalize raw research data
- Build unified datasets from multiple sources
- Generate analysis reports and visualizations
- Integration with Quarto for reproducible manuscripts
- One command to rule them all:
catecholamine_cli.py - Modular steps: run the full pipeline or individual stages
- Progress tracking and error handling
- Place raw extracted data in
data/raw/(CSV/TSV format). Never edit raw data files in place. - For each raw data file, create a sibling metadata file:
<file>.csv.meta.yaml - Run
python scripts/validate_raw.pyto validate data and metadata - Run
python scripts/build_dataset.pyto producedata/derived/master_dataset.parquet - Run
python scripts/build_reports.pyto generatereports/*.csvfiles used by the paper - Render the paper with Quarto:
quarto render paper/paper.qmd
If you have comprehensive research notes in deepresearch.md, you can automatically generate a structured scientific paper:
# Generate the paper structure from deepresearch.md
python scripts/generate_paper.py
# Render to HTML (works without Quarto)
python scripts/render_paper_simple.py
# Or render with Quarto (if installed)
cd paper
quarto render generated_paper.qmdThe catecholamine_cli.py provides a unified interface for all operations:
# Run the complete pipeline
python scripts/catecholamine_cli.py full
# Generate paper from deepresearch.md only
python scripts/catecholamine_cli.py generate
# Render the generated paper
python scripts/catecholamine_cli.py render --source generated
# Validate data
python scripts/catecholamine_cli.py validate
# Build dataset
python scripts/catecholamine_cli.py build
# Build reports
python scripts/catecholamine_cli.py reports
# See all options
python scripts/catecholamine_cli.py --helpWhile Jupyter notebooks are excellent for exploratory analysis, Quarto documents provide additional benefits for manuscript preparation:
- Clean rendering to HTML/PDF formats
- Code and narrative integrated in a single document
- Stable citations and figure numbering
- Git-friendly diff and merge operations
Recommendation: Use Quarto (.qmd) with a Python kernel for manuscript preparation, while continuing to use Jupyter notebooks for exploratory data analysis.
You can use any Python environment manager (uv, Poetry, conda). For a minimal pip installation:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txtInstall Quarto (system-level installation), then:
quarto render paper/paper.qmdOutput files will be generated in paper/_output/.
Important: Every raw data file must have an accompanying metadata file:
- Data file:
data/raw/weber2022_table3.csv - Metadata file:
data/raw/weber2022_table3.csv.meta.yaml
The metadata file should include:
- Paper citation key
- Population/species information
- Task description
- Units of measurement
- Extraction method (table, digitized, supplementary material)
- Any relevant caveats or notes
- Maintain BibTeX entries in
refs/references.bib - Cite in Quarto documents using:
@citation_key
Recommended: Use Zotero with Better BibTeX to export citations automatically. Manual maintenance is also supported.
This repository includes:
- Starter dataset with example data
- Processing scripts for data validation and normalization
- Paper skeleton with basic structure
- Documentation and schemas
Replace or extend these with your own research data and analysis.