Machine-Learning-Foundations
diff --git a/‎.flake8‎
Lines changed: 43 additions & 0 deletions b/‎.flake8‎
Lines changed: 43 additions & 0 deletions
diff --git a/‎.github/workflows/md2pdf.yml‎
Lines changed: 19 additions & 0 deletions b/‎.github/workflows/md2pdf.yml‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎.github/workflows/test.yml‎
Lines changed: 23 additions & 0 deletions b/‎.github/workflows/test.yml‎
Lines changed: 23 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 173 additions & 0 deletions b/‎.gitignore‎
Lines changed: 173 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 117 additions & 0 deletions b/‎README.md‎
Lines changed: 117 additions & 0 deletions
diff --git a/‎README.pdf‎
674 KB b/‎README.pdf‎
674 KB
@@ -0,0 +1,43 @@
+#########################
+# Flake8 Configuration  #
+# (.flake8)             #
+#########################
+[flake8]
+ignore =
+    # pickle
+    S301 
+    S403
+    S404
+    S603
+    # Line break before binary operator (flake8 is wrong)
+    W503
+    # Ignore the spaces black puts before columns.
+    E203
+    # allow path extensions for testing.
+    E402
+    DAR101
+    DAR201
+    # flake and pylance disagree on linebreaks in strings.
+    N400
+    # asserts are ok in test.
+    S101
+exclude =
+    .tox,
+    .git,
+    __pycache__,
+    docs/conf.py,
+    build,
+    dist,
+    *.pyc,
+    *.bib,
+    *.egg-info,
+    .cache,
+    .eggs,
+    data.
+    src/jaxwt/__init__.py
+max-line-length = 120
+max-complexity = 20
+import-order-style = pycharm
+application-import-names =
+    jaxwt
+    tests
@@ -0,0 +1,19 @@
+name: Convert README to PDF
+
+on: [push]
+
+jobs:
+  convert_via_pandoc:
+    name: Convert via Pandoc
+    runs-on: ubuntu-22.04
+    steps:
+      - uses: actions/checkout@v2
+      - run: mkdir output
+      - uses: maxheld83/pandoc@v2
+        with:
+          args: "--pdf-engine=xelatex --output=output/README.pdf README.md"
+      - uses: actions/upload-artifact@main
+        with:
+          name: readme
+          path: output
+            
@@ -0,0 +1,23 @@
+name: Tests
+
+on: [ push, pull_request ]
+
+jobs:
+  lint:
+    name: Lint
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: [3.11.0]
+    steps:
+      - uses: actions/checkout@v2
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v2
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Install dependencies
+        run: pip install nox
+      - name: Run flake8
+        run: nox -s lint
+      - name: Run mypy
+        run: nox -s typing
@@ -0,0 +1,173 @@
+.vscode/
+.pytest_cache/
+tciaDownload/
+data/scan_index.pkl
+data/gtexport
+data/tciaDownload
+data/pickled
+weights/
+runs/
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+.idea/
+
+test1.png
+test2.png
@@ -0,0 +1,117 @@
+# Day 13 Medical-Image Segmentation
+Medical applications are among the most exciting use cases of image segmentation networks.
+In this exercise, you will study the publication 
+["Towards Patient-Individual PI-RADS v2 Sector Map:
+CNN for Automatic Segmentation of Prostatic Zones
+from T2-Weighted MRI"](https://www.var.ovgu.de/pub/2019_Meyer_ISBI_Zone_Segmentation.pdf)
+by Meyer et al.
+
+
+1. To get started run
+
+```bash
+python ./data/download.py
+```
+
+in your terminal. The script will download and prepare the medical scans and domain-expert
+annotations for you.
+
+Data loading and resampling work already. 
+
+1. #### Find the bounding box roi as described below by finishing the `compute_roi` function.
+Once you have obtained the train and test data, you must create a preprocessing pipeline.
+Proceed to `src/util.py` and compute the so called region of interest.
+Meyer et al. define this region as:
+
+"The images were acquired by two different
+types of Siemens 3T MRI scanners (MAGNETOM Trio and Skyra)
+with a body coil. The ground truth segmentation of the prostate
+zones was created on the axial images with 3D Slicer [19] by a medical
+student and subsequently corrected by an expert urologist. All
+volumes were resampled to a spacing of 0.5 × 0.5 × 3 mm which
+corresponds to the highest in-plane resolution and maintains the rela-
+tion of in-plane to inter-plane resolution of the dataset. A bounding
+box ROI of the prostate was automatically extracted with help of
+sagittal and coronal T2w series: the ROI was defined as the intersecting
+volume of the three MR sequences."
+
+See wikipedia's [anatomical plane](https://en.wikipedia.org/wiki/Anatomical_plane) article for a description of the terminology.
+The plots below depict the situation for the 0004 scans:
+
+![roi3d](./fig/roi3d.png)
+![xy](./fig/xy.png)
+![yz](./fig/yz.png)
+
+After computing the intersection of all tensors, we can consider i.e. slice 12 of the 
+transversal scan:
+
+![roi](./fig/roi.png)
+
+Your implementation needs to translate the array indices from local into global coordinate systems and back.
+In other words, we require a rotation and translation, or more formally
+
+$$ \mathbf{R}\mathbf{x} + \mathbf{o} = \mathbf{g} .$$
+
+With a rotation matrix $\mathbf{R} \in \mathbb{R}^{3,3}$, the local coordinate vector $\mathbf{x \in \mathbb{R}^{3}}$, the offset $\mathbf{o} \in \mathbb{R}^{3}$, and the global coordinate line $\mathbf{g}$.
+Evaluate this transform for every coordinate box line. Use the `box_lines` function from the
+`util.py` module to generate a bounding box at the origin. All points in every line must be transformed using the above relationship.
+
+The region of interest is the overlap of all boxes in the global coordinate system. Use [np.amin](https://numpy.org/doc/stable/reference/generated/numpy.amin.html) and [np.amax](https://numpy.org/doc/stable/reference/generated/numpy.amax.html) to find roi-box points $\mathbf{r} \in \mathbb{R}^{3}$. 
+
+To obtain array indices, transform all box points back into the local system. Or, more formally:
+
+$$ \mathbf{R}^{-1} \mathbf{r} - \mathbf{o} = \mathbf{x}_{\text{roi}} $$
+
+With the inverse of the rotation matrix $\mathbf{R}^{-1}$ use [np.linalg.inv](https://numpy.org/doc/stable/reference/generated/numpy.linalg.inv.html) to compute it. $\mathbf{x}_{\text{roi}} \in \mathbb{R}^{3}$ is a point on the boundary of the local roi-box we seek.
+Transform all boundary points.
+
+Using the smallest and largest coordinate values of the roi box in
+local coordinates now allows array indexing. Following Meyer et al. we discard all but the axial `t2w` scans.
+
+Test your implementation by setting the if-condition wrapping the plotting utility in `compute_roi` to `True` and running vscode pytest `test_roi`. Remember to set it back to `False` afterwards.
+
+2. #### Implement the UNet. 
+
+Navigate to the `train.py` module file in the `src` folder. 
+Finish the `UNet3D` class, as discussed in the lecture. 
+Use the [flax.linen.Conv](https://flax.readthedocs.io/en/latest/api_reference/flax.linen/_autosummary/flax.linen.Conv.html), [flax.linen.relu](https://flax.readthedocs.io/en/latest/api_reference/flax.linen/_autosummary/flax.linen.activation.relu.html), and [flax.linen.ConvTranspose](https://flax.readthedocs.io/en/latest/api_reference/flax.linen/_autosummary/flax.linen.ConvTranspose.html), to build your model.
+
+3. #### Implement the focal-loss
+
+Open the `util.py` module in `src` and implement the `softmax_focal_loss` function as discussed in the lecture:
+
+$$\mathcal{L}(\mathbf{o},\mathbf{I})=-\mathbf{I}\cdot(1-\sigma_s(\mathbf{o}))^\gamma\cdot\alpha\cdot\ln(\sigma_s(\mathbf{o})) $$
+
+with output logits $\mathbf{o}$, the corresponding labels $\mathbf{I}$ and the softmax function $\sigma_s$.
+
+4. #### Run and test the training script.
+
+Execute the training script with by running `scripts/train.slurm` (locally or using `sbatch`).
+
+After training you can test your model by changing the `checkpoint_name` variable in `src/sample.py` to the desired model checkpoint and running `scripts/test.slurm`.
+
+#### Solution:
+![slice](./fig/prostatext2.png)
+![slice](./fig/prostatext2_net.png)
+![slice](./fig/prostatext2_true.png)
+
+5. #### (Optional) Implement mean Intersection-over-Union (mIoU)
+
+Open the `meanIoU.py` in `src` and implement the `compute_iou` function as discussed below.
+mIoU is the most common metric used for evaluating semantic segmentation tasks. It can be computed using the values from a confusion matrix as given below
+
+$$\text{mIoU} = \frac{1}{k} \sum_{c=0}^{k}\frac{TP_c}{TP_c+FP_c+FN_c}$$
+
+where `k`, `TP`, `FP`, and `FN` are number of classes, True Positives, False Positives and False Negatives respectively.
+The mIoU value generally ranges from 0 to 1 with 0 means no intersection area between predicted segmentation map and ground truth map and 1 means its a perfect fit between these two.
+Generally any $\text{mIoU}>0.5$ is considered better.
+
+Run the script with
+```bash
+python -m src.meanIoU
+```
+
+### Acknowledgments:
+We thank our course alumni Barbara Wichtmann, for bringing this problem to our attention.
+Without her feedback, this code would not exist.
+