An intelligent workflow system for processing CIF (Crystallographic Information File) files of Metal-Organic Frameworks (MOFs), featuring automated structure analysis, solvent identification, and literature retrieval.
This project provides an automated pipeline for processing MOF crystal structures, including:
- CIF File Processing: Convert structures to P1 space group and primitive cells
- CSD Database Integration: Retrieve structure information from Cambridge Structural Database
- Cluster Analysis: Identify and analyze atomic clusters in crystal structures
- Solvent Identification & Removal: Automatically detect and remove solvent molecules
- Literature Retrieval: Download and parse related research papers
- Batch Processing: Parallel processing of multiple CIF files
- Multi-Agent Architecture: Specialized agents for different tasks (CSD info, cluster analysis, solvent identification, etc.)
- LLM-Powered Analysis: Uses DeepSeek LLM for intelligent structure analysis and decision-making
- Automated Workflow: End-to-end processing from raw CIF files to cleaned structures
- Parallel Processing: Efficient batch processing with multiprocessing support
- PDF Integration: Automatic paper download and content extraction
- Python 3.8+
- DeepSeek API key (configured in
config.py) - CSD Python API (optional, for CSD database access)
- Required Python packages (see
environment.yml)
-
Configure API keys in
src/config.py:DEEPSEEK_API_KEY = "your_api_key" UNPAYWALL_EMAIL = "your_email"
-
Set input/output paths in
config.py:INPUT_FOLDER = "path/to/cif/files" P1_OUTPUT_FOLDER = "path/to/p1/output" FINAL_OUTPUT_FOLDER = "path/to/final/output"
-
Run single file processing:
python src/run_workflow.py
-
Run batch processing:
python src/batch_process_parallel.py
sopranos@sjtu.edu.cn