"If I have seen further it is by standing on the shoulders of Giants."
Isaac Newton, 1675.
Raw results and plotting scripts for paper under review
git clone [email protected]:systemoutprintlnhelloworld/CCBench.git --recursive
Collecting environment details
# Paper Reproduction Code
This repository contains the code for reproducing all experiments and figures described in our paper. By running these scripts, you can replicate the datasets, evaluation procedures, and results used in the publication.Repository: https://github.com/systemoutprintlnhelloworld/CCBench
(workspaces)
├─ 1.Dataset construct
│ ├─ extract image
│ │ ├─ download_image-v2.py
│ │ ├─ download_image-v3.py
│ │ └─ download&compare_img_in_excel.py
│ ├─ pipeline mainthread
│ └─ postprogress
├─ 2.Evaluation pipeline
│ ├─ QA
│ │ ├─ formatted
│ │ └─ pipeline
│ └─ VQA
├─ 3.Answer assessment
│ ├─ Close-ended
│ └─ Open-ended
├─ 4.Result statistics
│ ├─ Fig. 1
│ ├─ Fig. 4
│ ├─ Fig. 5
│ └─ Fig. 6
└─ LICENSE
README.mdpip install \
pandas \
openpyxl \
tqdm \
matplotlib \
openaiBefore running the evaluation process, you need to construct the dataset first. Please run the following scripts:
python 1.Dataset construct/extract image/download_image-v2.py
python 1.Dataset construct/extract image/download_image-v3.py
python 1.Dataset construct/extract image/download&compare_img_in_excel.pyIn this section, each script corresponds to specific figures or tables in the paper. Running these scripts will generate necessary outputs to replicate the quantitative and qualitative results. Scripts may include:
- Data parsing or intermediate result generation
- Automatic metric computation
- Plot or table creation
Refer to the figure or table identifier in the paper to select the correct script to run, and confirm if additional parameters (e.g., --use_human_abstract) are required for your use case.
Run the corresponding evaluation scripts according to different figures and tables:
Run the following scripts for answer assessment:
python 3.Answer assessment/Close-ended/evaluate_close_ended.py
python 3.Answer assessment/Open-ended/evaluate_open_ended.pyRun the corresponding result statistics scripts according to different figures and tables:
python 4.Result statistics/Fig. 1/plot_fig1.py
python 4.Result statistics/Fig. 4/plot_fig4.py
python 4.Result statistics/Fig. 5/plot_fig5.py
python 4.Result statistics/Fig. 6/plot_fig6.pyThis project is licensed under the Apache License 2.0.
If you use this repository in your work, please cite our paper:
[1] Hong, Q., Liu, S., Wu, L. et al. Evaluating the performance of large language & visual-language models in cervical cytology screening. npj Precis. Onc. 9, 153 (2025). https://doi.org/10.1038/s41698-025-00916-7