Skip to content

systemoutprintlnhelloworld/CCBench

Repository files navigation

CCBench Reproducible Results Code (to be updated upon publication)

License: Apache 2.0 Python Versions

"If I have seen further it is by standing on the shoulders of Giants."

Isaac Newton, 1675.

Raw results and plotting scripts for paper under review

To work with this repo locally:

git clone [email protected]:systemoutprintlnhelloworld/CCBench.git --recursive

Collecting environment details

# Paper Reproduction Code

This repository contains the code for reproducing all experiments and figures described in our paper. By running these scripts, you can replicate the datasets, evaluation procedures, and results used in the publication.

Directory Structure

Repository: https://github.com/systemoutprintlnhelloworld/CCBench

(workspaces)
├─ 1.Dataset construct
│   ├─ extract image
│   │   ├─ download_image-v2.py
│   │   ├─ download_image-v3.py
│   │   └─ download&compare_img_in_excel.py
│   ├─ pipeline mainthread
│   └─ postprogress
├─ 2.Evaluation pipeline
│   ├─ QA
│   │   ├─ formatted
│   │   └─ pipeline
│   └─ VQA
├─ 3.Answer assessment
│   ├─ Close-ended
│   └─ Open-ended
├─ 4.Result statistics
│   ├─ Fig. 1
│   ├─ Fig. 4
│   ├─ Fig. 5
│   └─ Fig. 6
└─ LICENSE
   README.md

Running Environment

pip install \
  pandas \
  openpyxl \
  tqdm \
  matplotlib \
  openai

Dataset Construction

Before running the evaluation process, you need to construct the dataset first. Please run the following scripts:

python 1.Dataset construct/extract image/download_image-v2.py
python 1.Dataset construct/extract image/download_image-v3.py
python 1.Dataset construct/extract image/download&compare_img_in_excel.py

Evaluation Process

In this section, each script corresponds to specific figures or tables in the paper. Running these scripts will generate necessary outputs to replicate the quantitative and qualitative results. Scripts may include:

  • Data parsing or intermediate result generation
  • Automatic metric computation
  • Plot or table creation

Refer to the figure or table identifier in the paper to select the correct script to run, and confirm if additional parameters (e.g., --use_human_abstract) are required for your use case.

Run the corresponding evaluation scripts according to different figures and tables:

Answer Assessment

Run the following scripts for answer assessment:

python 3.Answer assessment/Close-ended/evaluate_close_ended.py
python 3.Answer assessment/Open-ended/evaluate_open_ended.py

Result Statistics

Run the corresponding result statistics scripts according to different figures and tables:

python 4.Result statistics/Fig. 1/plot_fig1.py
python 4.Result statistics/Fig. 4/plot_fig4.py
python 4.Result statistics/Fig. 5/plot_fig5.py
python 4.Result statistics/Fig. 6/plot_fig6.py

License

This project is licensed under the Apache License 2.0.

Citation

If you use this repository in your work, please cite our paper:

[1] Hong, Q., Liu, S., Wu, L. et al. Evaluating the performance of large language & visual-language models in cervical cytology screening. npj Precis. Onc. 9, 153 (2025). https://doi.org/10.1038/s41698-025-00916-7

About

the first dataset in cervical cytology to include both QA and VQA data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages