Skip to content

Commit 78f29b0

Browse files
authored
Merge pull request #6 from databricks-industry-solutions/fix/linkfeedback
Update README.md - fix relative link to notebook
2 parents 586cb6e + 4ea276d commit 78f29b0

File tree

4 files changed

+39
-14
lines changed

4 files changed

+39
-14
lines changed

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ Please see our [installation guide](./INSTALL.md)
2525

2626
© 2025 Databricks, Inc. All rights reserved. The source in this project is provided subject to the Databricks License [https://databricks.com/db-license-source]. All included or referenced third party libraries are subject to the licenses set forth below.
2727

28-
| Package | Purpose | License | Source |
29-
|---------|---------|---------|--------|
30-
| pydicom | Python api for DICOM files | MIT | https://github.com/pydicom/pydicom |
28+
| Datasource | Package | Purpose | License | Source |
29+
| ---------- | ---------- | --------------------------------- | ----------- | ------------------------------------ |
30+
| zipdcm | pydicom | Python api for DICOM files | MIT | https://github.com/pydicom/pydicom |
31+
| zipdcm | pylibjpeg | Decoding / Encoding pixel formats | GPLv3 & MIT | https://github.com/pydicom/pylibjpeg |

zipdcm/README.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,15 @@ from dbx.zip_dcm_ds import ZipDCMDataSource
88
spark.dataSource.register(ZipDCMDataSource)
99

1010
# read DCMs with `numPartitions` parallelism.
11-
df = spark.read.format("zipdcm").option('numPartitions',4).load("./resources")
11+
df = (
12+
spark.read
13+
.format("zipdcm")
14+
.option("numPartitions",4)
15+
.load("./resources")
16+
)
1217
df.display()
1318
```
14-
For more, see our [demo]($./demo) notebook.
19+
For more, see our [demo](./zip-dicom-demo.ipynb) notebook.
1520

1621
## Install
1722

@@ -38,3 +43,6 @@ Run unit tests
3843
```bash
3944
make test
4045
```
46+
47+
### Synthetic PHI data source citation
48+
Rutherford, M. W., Nolan, T., Pei, L., Wagner, U., Pan, Q., Farmer, P., Smith, K., Kopchick, B., Laura Opsahl-Ong, Sutton, G., Clunie, D. A., Farahani, K., & Prior, F. (2025). Data in Support of the MIDI-B Challenge (MIDI-B-Synthetic-Validation, MIDI-B-Curated-Validation, MIDI-B-Synthetic-Test, MIDI-B-Curated-Test) (Version 1) [Dataset]. The Cancer Imaging Archive. https://doi.org/10.7937/CF2P-AW56

zipdcm/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
pydicom==3.0.1
2+
pylibjpeg[all]>=2.0.1
23
pyspark==4.0.0.dev1

zipdcm/zip-dicom-demo.ipynb

Lines changed: 24 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,14 @@
1919
"# Read Zipped DICOM files saving time and storage\n",
2020
"WIth the custom \"zipdcm\" Python Data Source, we can read zipped (and non Zipped) up DICOM files directly to extract their metadata.\n",
2121
"\n",
22-
"Requirements:\n",
23-
"- Recommend DBR 17.0 (Apache Spark 4.0) compute\n",
22+
"### Requirements:\n",
23+
"- Recommend DBR 17.1 (Apache Spark 4.0) dedicated compute\n",
2424
"- Shared cluster compute compatible\n",
2525
"- Working on serverless compute fix.\n",
26-
"- Requires `pydicom==3.0.1`"
26+
"- Requires `pydicom==3.0.1 pylibjpeg[all]>=2.0.1`\n",
27+
"\n",
28+
"### Synthetic PHI data source citation\n",
29+
"Rutherford, M. W., Nolan, T., Pei, L., Wagner, U., Pan, Q., Farmer, P., Smith, K., Kopchick, B., Laura Opsahl-Ong, Sutton, G., Clunie, D. A., Farahani, K., & Prior, F. (2025). Data in Support of the MIDI-B Challenge (MIDI-B-Synthetic-Validation, MIDI-B-Curated-Validation, MIDI-B-Synthetic-Test, MIDI-B-Curated-Test) (Version 1) [Dataset]. The Cancer Imaging Archive. https://doi.org/10.7937/CF2P-AW56 "
2730
]
2831
},
2932
{
@@ -53,7 +56,8 @@
5356
}
5457
],
5558
"source": [
56-
"%pip install --quiet pydicom==3.0.1\n",
59+
"# %pip install --quiet numpy==1.26.4 pydicom==3.0.1 pylibjpeg[all]>=2.0.1\n",
60+
"%pip install --quiet numpy==2.1.3 pydicom==3.0.1 pylibjpeg[all]>=2.0.1\n",
5761
"%restart_python"
5862
]
5963
},
@@ -79,7 +83,7 @@
7983
"name": "stdout",
8084
"output_type": "stream",
8185
"text": [
82-
"total 57M\n-rwxrwxrwx 1 root root 12K Aug 1 21:09 1.3.199.1.2.3712432.1.402.1107814368275696879.zip\n-rwxrwxrwx 1 root root 24M Aug 1 21:09 3.5.574.1.3.9030958.6.376.1780887819048872979.zip\n-rwxrwxrwx 1 root root 12M Aug 1 21:09 3.5.574.1.3.9030958.6.376.2860280475000825621.zip\ndrwxrwxrwx 2 root root 4.0K Aug 2 17:33 x\n-rwxrwxrwx 1 root root 12M Aug 1 21:09 x.zip\ndrwxrwxrwx 2 root root 4.0K Aug 2 17:33 y\n-rwxrwxrwx 1 root root 12M Aug 1 21:09 y.zip\n"
86+
"total 57M\n-rwxrwxrwx 1 root root 12K Aug 1 21:09 1.3.199.1.2.3712432.1.402.1107814368275696879.zip\n-rwxrwxrwx 1 root root 24M Aug 1 21:09 3.5.574.1.3.9030958.6.376.1780887819048872979.zip\n-rwxrwxrwx 1 root root 12M Aug 1 21:09 3.5.574.1.3.9030958.6.376.2860280475000825621.zip\ndrwxrwxrwx 2 root root 4.0K Aug 10 04:17 x\n-rwxrwxrwx 1 root root 12M Aug 1 21:09 x.zip\ndrwxrwxrwx 2 root root 4.0K Aug 10 04:17 y\n-rwxrwxrwx 1 root root 12M Aug 1 21:09 y.zip\n"
8387
]
8488
}
8589
],
@@ -213,24 +217,35 @@
213217
"spark.dataSource.register(ZipDCMDataSource)\n",
214218
"\n",
215219
"# read DCMs with `numPartitions` parallelism.\n",
216-
"df = spark.read.format(\"zipdcm\").option('numPartitions',4).load(\"./resources\")\n",
220+
"df = (\n",
221+
" spark.read\n",
222+
" .format(\"zipdcm\")\n",
223+
" .option(\"numPartitions\",4)\n",
224+
" .load(\"./resources\")\n",
225+
")\n",
217226
"df.display()"
218227
]
219228
}
220229
],
221230
"metadata": {
222231
"application/vnd.databricks.v1+notebook": {
223-
"computePreferences": null,
232+
"computePreferences": {
233+
"hardware": {
234+
"accelerator": null,
235+
"gpuPoolId": null,
236+
"memory": null
237+
}
238+
},
224239
"dashboards": [],
225240
"environmentMetadata": {
226-
"base_environment": "dbe_65bc13ea-276c-4905-a728-9fe2fb1780e2",
241+
"base_environment": "",
227242
"environment_version": "2"
228243
},
229244
"inputWidgetPreferences": null,
230245
"language": "python",
231246
"notebookMetadata": {
232247
"mostRecentlyExecutedCommandWithImplicitDF": {
233-
"commandId": 7424973428825328,
248+
"commandId": 5816783787054213,
234249
"dataframes": [
235250
"_sqldf"
236251
]

0 commit comments

Comments
 (0)