A tool that automatically analyzes academic papers, extracting and explaining figures and their connections to the research. This tool uses LLMs to provide detailed analysis of each figure and its relationship to the paper's content.
- Extracts paper metadata (title, authors, abstract)
- Identifies and counts total figures in the paper
- Provides detailed analysis of each figure
- Generates comprehensive explanations of how figures relate to the research
- Parallel processing for efficient analysis of multiple figures
- Structured output with separate files for metadata, background, and figure analysis
- Python 3.10+
- OpenAI API key
- Clone the repository:
git clone [your-repo-url]
cd [repo-name]
- Run the setup script:
python setup_project.py
- Create a
.env
file in the root directory and add your OpenAI API key:
OPENAI_API_KEY=your_api_key_here
.
├── papers/ # Place your PDF papers here
├── output/ # Analysis results will be saved here
├── paper_analyzer.py # Main analysis script
├── utils.py # Utility functions
├── config.py # Configuration settings
├── templates.py # Prompt templates
├── setup.py # Package setup configuration
└── setup_project.py # Project setup script
-
Place your academic paper (PDF format) in the
papers/
directory. -
Run the analyzer:
python paper_analyzer.py papers/your_paper.pdf [--output-dir custom/output/path]
The script will:
- Extract basic paper details (title, authors, abstract)
- Count the total number of figures
- Analyze each figure in detail
- Generate connections between figures and research content
- Analyze background information
- Save the analysis in separate files under the output directory
The analysis will be saved in the output directory with the following files:
metadata.txt
: Paper details (title, authors, abstract, figure count)background.txt
: Detailed background analysis and prerequisitesfigures_analysis.txt
: For each figure:- Initial analysis (Information and Connection)
- Expanded analysis with additional context
- Detailed relationships to research content
You can modify the following settings in config.py
:
PAPER_DIR
: Directory for input PDF papers (default: "papers")OUTPUT_DIR
: Directory for saving analysis results (default: "output")DEFAULT_MODEL
: GPT model to use for analysis (default: "gpt-4o-mini")
Contributions are welcome! Please feel free to submit a Pull Request.
- Enhance paper content extraction:
- Deep dive analysis of Results and Conclusion sections
- Extract and analyze Future Work sections
- Comprehensive analysis of Background/Introduction
- Generate research context summaries
- Improve figure analysis:
- Add support for tables and charts
- Generate figure relationships map
- Extract figure captions and references
- Advanced features:
- Citation network analysis
- Research methodology extraction
- Key findings summarization
- Built with LangChain
- Powered by OpenAI's GPT models