Name	Name	Last commit message	Last commit date
parent directory ..
data	data
result	result
src	src
.gitignore	.gitignore
README.md	README.md
requirements.txt	requirements.txt

Name

Last commit message

Last commit date

data

Extraction with LLMs

The code for chemical information extraction from PDF and images of PDF pages using GPT-4o as a baseline model.

🔧 Installation

pip install -r requirements.txt

Poppler has to be installed and added to PATH, follow the instructions here.

🚀 Usage

Put open access article PDFs into data/pdfs/pdf_<dataset> folders.
Merge article and supporting infromation files. Dataset keys: oxazolidinone, benzimidazole, cocrystals, complexes, nanozymes, magnetic, cytotoxicity, seltox, synergy.

python src/merge_suppl.py --dataset <dataset>

Convert PDF into JPEG images

python src/pdf_to_images.py --dataset <dataset> --poppler_path <poppler_path>

Extraction from PDF

python src/pdf_extraction.py --dataset <dataset> --openai_api_key <YOUR_API_KEY>

Results will appear in the result/from_pdf folder.

Extraction from images

python src/images_extraction.py --dataset <dataset> --openai_api_key <YOUR_API_KEY>

Results will appear in the result/from_image folder.

Calculate metrics

python src/metric_calc.py --dataset <dataset> --source <pdf_or_image>

Metrics will appear in the result/metrics folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Extraction with LLMs

🔧 Installation

🚀 Usage

FilesExpand file tree

LLM

Directory actions

More options

Directory actions

More options

Latest commit

History

LLM

Folders and files

parent directory

README.md

Extraction with LLMs

🔧 Installation

🚀 Usage