Skip to content

Commit 534e797

Browse files
authored
Merge pull request #65 from DanielaSchacherer/master
Refinements for ISBI Tutorial
2 parents 67bebea + 9ecbe85 commit 534e797

File tree

2 files changed

+22
-17
lines changed

2 files changed

+22
-17
lines changed

notebooks/labs/idc_isbi2024.ipynb

Lines changed: 11 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
{
44
"cell_type": "markdown",
55
"metadata": {
6-
"colab_type": "text",
7-
"id": "view-in-github"
6+
"id": "view-in-github",
7+
"colab_type": "text"
88
},
99
"source": [
1010
"<a href=\"https://colab.research.google.com/github/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/labs/idc_isbi2024.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
@@ -16,7 +16,7 @@
1616
"id": "ATBH0iwpkHXP"
1717
},
1818
"source": [
19-
"# Experimenting with AI inference on slide microscopy data in IDC \n",
19+
"# Experimenting with AI inference on slide microscopy data in IDC\n",
2020
"\n",
2121
"This notebook is part of a [tutorial given at ISBI 2024](https://biomedicalimaging.org/2024/tutorials-final/).\n",
2222
"It demonstrates how the Imaging Data Commons (IDC) can be used to work with whole slide images (WSIs) and provides an example of the application of deep learning (DL) to computational pathology analysis.\n",
@@ -188,9 +188,10 @@
188188
"source": [
189189
"For most computational pathology experiments, the first step is to select a cohort of WSIs by filtering for the desired metadata attributes.\n",
190190
"\n",
191-
"The IDC uses the DICOM standard for data representation. Here, a WSI corresponds to a series of DICOM image objects, each representing the slide at a different resolution. Each DICOM object is stored as a separate DICOM file. Cohort selection is done easiest by executing SQL-like statements using the Python package idc-index against an index table, which lists all available DICOM files (rows) with the corresponding metadata atrributes (columns).\n",
191+
"The IDC uses the DICOM standard for data representation. Here, a WSI corresponds to a series of DICOM image objects, each representing the slide at a different resolution. Each DICOM object is stored as a separate DICOM file. Cohort selection is done easiest by executing SQL-like statements using the Python package idc-index against an index table, which lists all available DICOM files (rows) with the corresponding metadata atrributes (columns). For getting started with the idc-index, we refer to this [introductory notebook](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/getting_started/part2_searching_basics.ipynb).\n",
192192
"\n",
193-
"In the following we retrieve three slides from the CPTAC-LUAD and CPTAC-LSCC collections."
193+
"In the following we retrieve three slides from the CPTAC-LUAD and CPTAC-LSCC collections.\n",
194+
"\n"
194195
]
195196
},
196197
{
@@ -220,12 +221,8 @@
220221
" index.SeriesInstanceUID as digital_slide_id,\n",
221222
" index.StudyInstanceUID as case_id,\n",
222223
" (REPLACE (REPLACE(index.collection_id, 'cptac_luad', 'luad'), 'cptac_lscc', 'lscc')) AS cancer_subtype,\n",
223-
" -- The 'tissue_types' subquery indicates whether a slides contains normal, tumor or other tisse.\n",
224-
" CASE sm_index.primaryAnatomicStructureModifier_CodeMeaning\n",
225-
" WHEN 'Normal' THEN 'normal'\n",
226-
" WHEN 'Neoplasm, Primary' THEN 'tumor'\n",
227-
" ELSE 'other' -- meaning e.g.: 'Neoplasm, Metastatic'\n",
228-
" END AS tissue_type,\n",
224+
" -- The 'tissue_types' indicates whether a slides contains normal, tumor or other tisse.\n",
225+
" sm_index.primaryAnatomicStructureModifier_CodeMeaning as tissue_type\n",
229226
"FROM\n",
230227
" index\n",
231228
"JOIN\n",
@@ -234,7 +231,6 @@
234231
" (index.SeriesInstanceUID = '1.3.6.1.4.1.5962.99.1.261553051.626883586.1640939092891.2.0'\n",
235232
" OR index.SeriesInstanceUID = '1.3.6.1.4.1.5962.99.1.223072275.1494200661.1640900612115.2.0'\n",
236233
" OR index.SeriesInstanceUID = '1.3.6.1.4.1.5962.99.1.255269468.207807501.1640932809308.2.0')\n",
237-
" -- OR index.SeriesInstanceUID = '1.3.6.1.4.1.5962.99.1.237882392.736020130.1640915422232.2.0')\n",
238234
" AND index.Modality = 'SM'\n",
239235
"'''\n",
240236
"\n",
@@ -365,7 +361,7 @@
365361
"id": "msi0SjqdLkNM"
366362
},
367363
"source": [
368-
"**Note: we can of course do much more with wsidicom and openslide. Could be mentioned or shown.**"
364+
"Note: we can of course do much more with wsidicom and openslide. Just check out the respective documentations."
369365
]
370366
},
371367
{
@@ -569,9 +565,9 @@
569565
],
570566
"metadata": {
571567
"colab": {
572-
"include_colab_link": true,
573568
"provenance": [],
574-
"toc_visible": true
569+
"toc_visible": true,
570+
"include_colab_link": true
575571
},
576572
"environment": {
577573
"kernel": "python3",

notebooks/labs/idc_isbi2024_utils/utils.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ def _get_reference_class_label(slide_metadata: pd.DataFrame) -> str:
2121
return tissue_type
2222
else:
2323
return slide_metadata['cancer_subtype']
24-
24+
2525

2626
def create_slides_metadata(bq_results_df: pd.DataFrame, local_slides_dir: str) -> Dict[str, Any]:
2727
"""
@@ -45,10 +45,19 @@ def create_slides_metadata(bq_results_df: pd.DataFrame, local_slides_dir: str) -
4545

4646
if not image_id in slides_metadata:
4747
slides_metadata[image_id] = slide_metadata
48+
49+
# rename tissue type
50+
if slides_metadata[image_id]['tissue_type'] == 'Normal':
51+
slides_metadata[image_id]['tissue_type'] = 'normal'
52+
elif slides_metadata[image_id]['tissue_type'] == 'Neoplasm, Primary':
53+
slides_metadata[image_id]['tissue_type'] = 'tumor'
54+
else:
55+
slides_metadata[image_id]['tissue_type'] = 'other'
56+
4857
local_path = os.path.join(local_slides_dir, image_id)
4958
slides_metadata[image_id]['local_path'] = local_path
5059
slides_metadata[image_id]['reference_class_label'] = _get_reference_class_label(slide_metadata)
5160

5261

5362
return pd.DataFrame.from_records(list(slides_metadata.values()),
54-
index=list(slides_metadata.keys()))
63+
index=list(slides_metadata.keys()))

0 commit comments

Comments
 (0)