CCBench Reproducible Results Code (to be updated upon publication)

"If I have seen further it is by standing on the shoulders of Giants."

Isaac Newton, 1675.

Raw results and plotting scripts for paper under review

To work with this repo locally:

git clone [email protected]:systemoutprintlnhelloworld/CCBench.git --recursive

Collecting environment details

# Paper Reproduction Code

This repository contains the code for reproducing all experiments and figures described in our paper. By running these scripts, you can replicate the datasets, evaluation procedures, and results used in the publication.

Directory Structure

Repository: https://github.com/systemoutprintlnhelloworld/CCBench

(workspaces)
├─ 1.Dataset construct
│   ├─ extract image
│   │   ├─ download_image-v2.py
│   │   ├─ download_image-v3.py
│   │   └─ download&compare_img_in_excel.py
│   ├─ pipeline mainthread
│   └─ postprogress
├─ 2.Evaluation pipeline
│   ├─ QA
│   │   ├─ formatted
│   │   └─ pipeline
│   └─ VQA
├─ 3.Answer assessment
│   ├─ Close-ended
│   └─ Open-ended
├─ 4.Result statistics
│   ├─ Fig. 1
│   ├─ Fig. 4
│   ├─ Fig. 5
│   └─ Fig. 6
└─ LICENSE
   README.md

Running Environment

pip install \
  pandas \
  openpyxl \
  tqdm \
  matplotlib \
  openai

Dataset Construction

Before running the evaluation process, you need to construct the dataset first. Please run the following scripts:

python 1.Dataset construct/extract image/download_image-v2.py
python 1.Dataset construct/extract image/download_image-v3.py
python 1.Dataset construct/extract image/download&compare_img_in_excel.py

Evaluation Process

In this section, each script corresponds to specific figures or tables in the paper. Running these scripts will generate necessary outputs to replicate the quantitative and qualitative results. Scripts may include:

Data parsing or intermediate result generation
Automatic metric computation
Plot or table creation

Refer to the figure or table identifier in the paper to select the correct script to run, and confirm if additional parameters (e.g., --use_human_abstract) are required for your use case.

Run the corresponding evaluation scripts according to different figures and tables:

Answer Assessment

Run the following scripts for answer assessment:

python 3.Answer assessment/Close-ended/evaluate_close_ended.py
python 3.Answer assessment/Open-ended/evaluate_open_ended.py

Result Statistics

Run the corresponding result statistics scripts according to different figures and tables:

python 4.Result statistics/Fig. 1/plot_fig1.py
python 4.Result statistics/Fig. 4/plot_fig4.py
python 4.Result statistics/Fig. 5/plot_fig5.py
python 4.Result statistics/Fig. 6/plot_fig6.py

License

This project is licensed under the Apache License 2.0.

Citation

If you use this repository in your work, please cite our paper:

[1] Hong, Q., Liu, S., Wu, L. et al. Evaluating the performance of large language & visual-language models in cervical cytology screening. npj Precis. Onc. 9, 153 (2025). https://doi.org/10.1038/s41698-025-00916-7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CCBench Reproducible Results Code (to be updated upon publication)

To work with this repo locally:

Directory Structure

Running Environment

Dataset Construction

Evaluation Process

Answer Assessment

Result Statistics

License

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vscode		.vscode
1.Dataset construct		1.Dataset construct
2.Evaluation pipeline		2.Evaluation pipeline
3.Answer assessment		3.Answer assessment
4.Result statistics		4.Result statistics
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

systemoutprintlnhelloworld/CCBench

Folders and files

Latest commit

History

Repository files navigation

CCBench Reproducible Results Code (to be updated upon publication)

To work with this repo locally:

Directory Structure

Running Environment

Dataset Construction

Evaluation Process

Answer Assessment

Result Statistics

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages