Skip to content

Questions on AUC Calculation #13

@rtenala

Description

@rtenala

Dear Mirai Team,

First of all, we sincerely appreciate the work your team has done on Mirai. We are very grateful for the opportunity to explore and build upon your model, and we truly value the impact of your research.

We are currently working on evaluating and refining Mirai using our dataset, and we have encountered an aspect of the implementation that we would like to better understand.

Dataset and Case-Control Definition

To provide context, our dataset consists of:

  • Cases: Mammograms taken 2 and 4 years before cancer diagnosis.

  • Controls: Mammograms from patients with follow-ups at 2 and 4 years confirming they remain cancer-free.

Dataset Metadata

Regarding the years_to_last_followup column in the dataset description CSV, we would like to confirm whether assigning 2 and 4 years to cases is the correct approach, or if Mirai applies a different criterion for this assignment.

To clarify our approach, we have included a sample of our metadata below (for simplicity, we show one line per patient instead of the four views). It consists of two patients:

  • Patient 5 (Control): A patient with confirmed negative follow-ups at 2 and 4 years.
  • Patient 10822 (Case): A patient diagnosed with cancer, with mammograms taken at 2 and 4 years before diagnosis.
patient_id exam_id laterality view file_path years_to_cancer years_to_last_followup split_group
5 2011 L CC 00005_20990909_L_CC_2.dcm.png 100 4 test
5 2013 L CC 00005_20990909_L_CC_4.dcm.png 100 2 test
10822 2010 L CC 10822_20990909_L_CC_2.dcm.png 4 4 test
10822 2012 L CC 10822_20990909_L_CC_1.dcm.png 2 2 test

We would greatly appreciate it if you could confirm whether this approach is correct or suggest any necessary adjustments.

Validation AUC Calculation

When running validation (validate.sh), we observed that the AUC for 2 years is not reported, and the calculation begins at year 3. The results are as follows:

  test_1year_auc: NA (n=4632, c=0)  
  test_2year_auc: NA (n=4632, c=0)  
  test_3year_auc: 0.7334626432924374 (n=2702, c=386)  
  test_4year_auc: 0.7321885598718534 (n=2702, c=386)  
  test_5year_auc: NA (n=772, c=772)  

This suggests that cases are not classified as positive before year 3. We observe the same pattern during model refinement, which leads us to the following question:

  • Given that our dataset includes cases with 2-year follow-ups, could you clarify why they do not seem to be classified as positive cases in the risk estimation before year 3?

We initially expected cases with a 2-year follow-up to be classified as positive in the test_2year_auc metric, but we may be misinterpreting how this is handled. Understanding this aspect would be extremely helpful in ensuring we correctly interpret the results.

We appreciate your time and any insights you can provide. We are grateful for your support and look forward to your guidance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions