-
Notifications
You must be signed in to change notification settings - Fork 57
Description
Dear Mirai Team,
First of all, we sincerely appreciate the work your team has done on Mirai. We are very grateful for the opportunity to explore and build upon your model, and we truly value the impact of your research.
We are currently working on evaluating and refining Mirai using our dataset, and we have encountered an aspect of the implementation that we would like to better understand.
Dataset and Case-Control Definition
To provide context, our dataset consists of:
-
Cases: Mammograms taken 2 and 4 years before cancer diagnosis.
-
Controls: Mammograms from patients with follow-ups at 2 and 4 years confirming they remain cancer-free.
Dataset Metadata
Regarding the years_to_last_followup column in the dataset description CSV, we would like to confirm whether assigning 2 and 4 years to cases is the correct approach, or if Mirai applies a different criterion for this assignment.
To clarify our approach, we have included a sample of our metadata below (for simplicity, we show one line per patient instead of the four views). It consists of two patients:
- Patient 5 (Control): A patient with confirmed negative follow-ups at 2 and 4 years.
- Patient 10822 (Case): A patient diagnosed with cancer, with mammograms taken at 2 and 4 years before diagnosis.
| patient_id | exam_id | laterality | view | file_path | years_to_cancer | years_to_last_followup | split_group |
|---|---|---|---|---|---|---|---|
| 5 | 2011 | L | CC | 00005_20990909_L_CC_2.dcm.png | 100 | 4 | test |
| 5 | 2013 | L | CC | 00005_20990909_L_CC_4.dcm.png | 100 | 2 | test |
| 10822 | 2010 | L | CC | 10822_20990909_L_CC_2.dcm.png | 4 | 4 | test |
| 10822 | 2012 | L | CC | 10822_20990909_L_CC_1.dcm.png | 2 | 2 | test |
We would greatly appreciate it if you could confirm whether this approach is correct or suggest any necessary adjustments.
Validation AUC Calculation
When running validation (validate.sh), we observed that the AUC for 2 years is not reported, and the calculation begins at year 3. The results are as follows:
test_1year_auc: NA (n=4632, c=0)
test_2year_auc: NA (n=4632, c=0)
test_3year_auc: 0.7334626432924374 (n=2702, c=386)
test_4year_auc: 0.7321885598718534 (n=2702, c=386)
test_5year_auc: NA (n=772, c=772)
This suggests that cases are not classified as positive before year 3. We observe the same pattern during model refinement, which leads us to the following question:
- Given that our dataset includes cases with 2-year follow-ups, could you clarify why they do not seem to be classified as positive cases in the risk estimation before year 3?
We initially expected cases with a 2-year follow-up to be classified as positive in the test_2year_auc metric, but we may be misinterpreting how this is handled. Understanding this aspect would be extremely helpful in ensuring we correctly interpret the results.
We appreciate your time and any insights you can provide. We are grateful for your support and look forward to your guidance.