Skip to content

Commit 5561a49

Browse files
authored
Merge pull request #136 from SFI-Visual-Intelligence/christian/documentation
Update christian documentation
2 parents f5f9576 + 5a62ede commit 5561a49

File tree

8 files changed

+299
-269
lines changed

8 files changed

+299
-269
lines changed

README.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,16 @@ ChristianModel was trained on the USPS_0-6 dataset. The model was trained for a
129129
| Validation | 0.071 | 0.074 | 0.973 | 0.975 | 0.140 | 0.974 |
130130
| Test | 0.247 | 0.096 | 0.931 | 0.934 | 0.134 | 0.932 |
131131

132+
133+
### SolveigModel & USPS_7-9
134+
SolveigModel was trained on the `USPS_7-9` dataset. The model was trained over 40 epochs, evaluating on all metrics with macro-averaging enabled, as shown in the table below.
135+
136+
| Dataset Split | Loss | Entropy | Accuracy | Precision | Recall | F1 |
137+
|---------------|-------|---------|----------|-----------|--------|-------|
138+
| Train | 0.013 | 0.017 | 1 | 1 | 0.333 | 1 |
139+
| Validation | 0.004 | 0.010 | 0.996 | 0.996 | 0.332 | 0.996 |
140+
| Test | 0.222 | 0.023 | 0.962 | 0.964 | 0.324 | 0.963 |
141+
132142
### JohanModel & MNIST_4-9
133143
This section reports the results from using the model "JohanModel" and the dataset MNIST_4-9 which contains MNIST digits from 4 to 9 (six classes in total).
134144
All five available metrics were calculated for this experiment, model was trained for 12 epochs with learning rate of 0.001 and batch size 64.
@@ -142,6 +152,7 @@ The performance of the model is somewhat limited, at least compared with the res
142152
| Test | 0.679 | 0.618 | 0.810 | 0.870 | 0.810 | 0.755 |
143153

144154

155+
145156
## Citing
146157
Please consider citing this repository if you end up using it for your work.
147158
Several citation methods can be found under the "About" section.
@@ -160,4 +171,4 @@ year = {2025}
160171
For APA please use
161172
```
162173
Thrun, S., Salomonsen, C., Størdal, M., Zavadil, J., & Mylius-Kroken, J. (2025). Collaborative Coding Exam (Version 1.1.0) [Computer software]. https://github.com/SFI-Visual-Intelligence/Collaborative-Coding-Exam
163-
```
174+
```

doc/_static/custom.css

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
/* Float the figure container to the right */
2+
.float-right {
3+
float: right;
4+
position: relative;
5+
margin-left: 20px;
6+
/* Ensure the container wraps its content */
7+
display: inline-block;
8+
}
9+
10+
/* Position the figcaption at the bottom of the container */
11+
.float-right figcaption {
12+
position: absolute;
13+
bottom: 0;
14+
left: 0;
15+
right: 0;
16+
/* Optional: add a background for readability */
17+
background: rgba(255, 255, 255, 0.8);
18+
text-align: center;
19+
padding: 0.5em;
20+
}

doc/christian.md

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
Christian's task
2+
================
3+
4+
```{note}
5+
This page describes the part of the implementation that was assigned to Christian. While the code implementation itself was simple, Christian contributed to many of the repositories design and strucute choices. Note for instance the advanced usage of GitHub actions for formatting, testing, building and pushing Docker images, and creating releases upon tags.
6+
```
7+
8+
9+
## Overview
10+
---
11+
12+
The task given was to implement a dataset that handled downloading, loading, and, preprocessing of the [USPS](https://paperswithcode.com/dataset/usps) 0 to 6 digits. The data would then be processed by a predictive framework implementing a convolutional neural network (CNN) consisting of 2 convolutional layers with a max pooling layer, using 50, $3\times3$ filters followed by $2\times2$ max pooling, and a [rectified linear unit (ReLU)](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html) activation function. The prediction head uses a fully connected network, or multilayer perceptron (MLP) to map from the flattened feature maps to a fixed size output. To evaluate, the [Recall](https://en.wikipedia.org/wiki/Precision_and_recall) metric was implemented.
13+
14+
15+
## Convolutional neural network
16+
---
17+
18+
```{figure} figures/christian-model-overview.png
19+
---
20+
name: Model overview
21+
---
22+
Figure 1. ChristianModel in context: The blue volumes denotes image and channel shapes, whereas red volumes denotes convolutional block filter. Each convolutional block is followed by a 2D max-pooling kernel with stride 2, and a Rectified Linear Unit (ReLU) activation function. After the second convolutional block, the data is flattened and sent through a fully connected layer which maps the flattened vector to the 7 (or `num_classes`) output shapes.
23+
```
24+
25+
A standard CNN, duly named [ChristianModel](https://sfi-visual-intelligence.github.io/Collaborative-Coding-Exam/autoapi/CollaborativeCoding/models/christian_model/index.html#CollaborativeCoding.models.christian_model.ChristianModel) was implemented to process 2D image data for handwritten digit classification. Since the CNN used two convolutional layers _under the hood_, it was beneficial to implement a [convolutional block (CNNBlock)](https://sfi-visual-intelligence.github.io/Collaborative-Coding-Exam/autoapi/CollaborativeCoding/models/christian_model/index.html#CollaborativeCoding.models.christian_model.CNNBlock), which made the network implementation simpler. At the intersection between the convolutional and the fully connected networks---or feature extractor, and predictive network---a function [`find_fc_input_shape`](https://sfi-visual-intelligence.github.io/Collaborative-Coding-Exam/autoapi/CollaborativeCoding/models/christian_model/index.html#CollaborativeCoding.models.christian_model.find_fc_input_shape), computed the input size to the MLP using a clever trick, where a dummy image of the same size of the input is sent through the feature extractor to derive the final shape, then flattening to know what size the predictive network would need as input. This means the CNN, before initialization, is agnostic to the input size, and can in principle learn, or be used for evaluation on any 2D images, given that the initialized model has been trained on the same image shape.
26+
27+
28+
### Structure
29+
30+
```python
31+
ChristianModel(
32+
(cnn1): CNNBlock(
33+
(conv): Conv2d(1, 50, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
34+
(maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
35+
(relu): ReLU()
36+
)
37+
(cnn2): CNNBlock(
38+
(conv): Conv2d(50, 100, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
39+
(maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
40+
(relu): ReLU()
41+
)
42+
(fc1): Linear(in_features=4900, out_features=7, bias=True)
43+
)
44+
```
45+
> _Torch summary of the network when initializing for a $28\times28$ image with 7 output classes. Notice how the `CNNBlock` only differs by the channel mappings, and thus simplifies the implementation through abstraction. This shows the same information as in [Figure 1](convolutional-neural-network)_
46+
47+
As per the model description, a CNN consisting of two convolutional blocks that include 2D max-pooling, and a ReLU activation function was implemented. The first convolutional block learns a mapping from a 1-channel greyscale image to 50-channel feature maps, using a $3\times3$ convolutional kernel. The convolutional kernel uses a padding of 1, thus preserving the size of the input along the latter dimensions (height and width), but applying a 2D max pooling operation with stride 2 reduces the image size by half the original size. The second convolutional block learns a similar mapping from 50 to 100 feature maps, further halving the spatial size of the image. The feature maps are then flattened, and processed by a fully connected layer, mapping to `num_classes`.
48+
49+
## United States Postal Service Dataset
50+
---
51+
52+
53+
```{figure} https://www.researchgate.net/publication/342090211/figure/fig2/AS:901050672349184@1591838629358/Example-images-of-USPS-dataset.ppm
54+
---
55+
name: Dataset samples
56+
---
57+
Figure 2. Excerpt from USPS dataset.
58+
```
59+
60+
The dataset implements downloading, loading, and, preprocessing of images from a subset of the United States Postal Service (USPS) dataset, corresponding to digits 0 to 6. Check the [api-reference for `USPSDataset0_6`](https://sfi-visual-intelligence.github.io/Collaborative-Coding-Exam/autoapi/CollaborativeCoding/dataloaders/usps_0_6/index.html#CollaborativeCoding.dataloaders.usps_0_6.USPSDataset0_6).
61+
62+
```{note}
63+
While many platforms such as [kaggle](https://www.kaggle.com/datasets/bistaumanga/usps-dataset) provide versions of the USPS dataset, they generally do not allow api-based downloading, which is required for this project. Thus, we use the official sources for downloading the training and test partitions, that come as binary, compressed, bz2-files from:
64+
- Train: [https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/usps.bz2](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/usps.bz2)
65+
- Test: [https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/usps.t.bz2](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/usps.t.bz2)
66+
```
67+
68+
The datasets are downloaded from the official sources, processed into usable images and labels, then stored in the map-style, hierarchical [HDF5](https://www.hdfgroup.org/solutions/hdf5/) file format for ease of use. When accessing a sample, the data loader makes sure to only load a single sample at a time into memory to conserve resources, which is stated as a requirement for the assignment.
69+
70+
### Downloading
71+
72+
```{warning}
73+
While the requirements state that the whole dataset should not be loaded into memory at a time, for small datasets such as the USPS, a modern computer would have a easier time loading the entire dataset into memory, because of its modest image size and number of samples, totaling about 2.91 MB (train + test partitions).
74+
```
75+
76+
Each of the partitions is accessed throughout reading the `usps.h5` file, then reading a sample from either `/train` or `/test` internally in the file. The implemented [`USPSDataset0_6`](https://sfi-visual-intelligence.github.io/Collaborative-Coding-Exam/autoapi/CollaborativeCoding/dataloaders/usps_0_6/index.html#CollaborativeCoding.dataloaders.usps_0_6.USPSDataset0_6) decides which partition to load based on the argument `train` (boolean).
77+
78+
79+
### Pre-processing
80+
81+
Due to the collaborative nature of this project, datasets need to be capable of loading the same images but with different sizes. Thus, although the USPS dataset is constructed with $16\times16$ image sizes in mind, other datasets such as the [MNIST](https://sfi-visual-intelligence.github.io/Collaborative-Coding-Exam/Jan_page.html#mnist-dataset-in-depth) dataset assumes $28\times28$ image sizes. Therefore, the dataset accepts a `transform` argument, which preferably should apply a sequence of [Torchvision transforms](https://pytorch.org/vision/0.9/transforms.html), for instance using:
82+
83+
```python
84+
from torchvision import transforms
85+
from CollaborativeCoding.dataloaders import USPSDataset0_6
86+
transform = transforms.Compose([
87+
transforms.Resize((28, 28)),
88+
transforms.ToTensor()
89+
])
90+
dataset = USPSDataset0_6(
91+
data_path="data",
92+
transform=transform,
93+
download=True,
94+
train=True
95+
)
96+
```
97+
98+
## Metric: Recall
99+
---
100+
101+
<figure class="float-right">
102+
<img src=https://upload.wikimedia.org/wikipedia/commons/thumb/2/26/Precisionrecall.svg/350px-Precisionrecall.svg.png alt="">
103+
<figcaption>Figure 3. Visual explanation of precision vs. recall</figcaption>
104+
</figure>
105+
106+
Recall, also known as sensitivity, is the subset of relevant instances retrieved, i.e., the true positives, where the predictive network made a correct prediction divided by the total number of relevant elements. In the case of multi-class prediction, that means the number of predictions the network got right, divided by the number of occurrences of the class. The keen reader will have noticed there are two possible ways of computing recall in a multi-class setting; first, the recall might be computed individually per class, then averaged over all classes, known as _macro-averaging_, which gives equal weight to each class; on the other hand, micro averaging aggregates the true positives and false negatives across all the classes, before calculating the metric based on the total counts, giving each instance the same weight. In this implementation of the metric, the user is able to specify which of the two types they want using the argument `macro_averaging` (boolean).
107+
108+
This project's implementation of metrics is also the first place where Pytorch customs are broken. Where `torch.nn.Module`, which our metrics are inheriting from, generally advises users to rely on two interfaces. First, the class should be initialized using `metric = Recall(...)`, then to compute the recall, one would generally expect to run `recall_score = metric(y, logits)`, however, the group decided to store each metric, before aggregating and computing the score on an epoch-level, for more accurate computations of our metrics. While this might cause confusion for inexperienced users, we restate the age-old saying of [__read the docs__ (!)](https://sfi-visual-intelligence.github.io/Collaborative-Coding-Exam/index.html).
109+
And as such, the correct usage would instead be:
110+
111+
```python
112+
from CollaborativeCoding.metrics import Recall
113+
114+
metric = Recall(macro_averaging=True, num_classes=7)
115+
...
116+
metric(y_true, logits)
117+
118+
score = metric.__get_metrics__()
119+
```
120+
121+
Where the use of a [_dunder method_](https://www.geeksforgeeks.org/dunder-magic-methods-python/) signals to the user that this should be treated as a private-class method, we provide a simpler interface through our [`MetricWrapper` (link)](https://www.geeksforgeeks.org/dunder-magic-methods-python/).
122+
123+
124+
## Challenges
125+
---
126+
127+
This course focuses and requires the collaboration between multiple people, where a foundational aspect is the collaboration and interoperability of our code. This meant that a common baseline, and an agreement of the quality, and design choices of our implementation stood at the centre as a glaring challenge. However, throughout the use of inherently collaborative tools such as [Git](https://git-scm.com/) and [GitHub](https://github.com/) we managed to find a common style:
128+
129+
1. When bugs are noticed, raise an issue.
130+
2. The `main`-branch of the GitHub repository is protected, therefore all changes must;
131+
1. Start out as a pull-request, preferably addressing an issue.
132+
2. Pass all [GitHub Actions](https://github.com/SFI-Visual-Intelligence/Collaborative-Coding-Exam/actions), which meant:
133+
- Formatting with [ruff](https://astral.sh/ruff) and [isort](https://pycqa.github.io/isort/).
134+
- [Tests](https://github.com/SFI-Visual-Intelligence/Collaborative-Coding-Exam/tree/854cda6c4c9dc06067a862a54b992b411246b93c/tests) using [Pytest](https://docs.pytest.org/en/stable/).
135+
- Build documentation with [Sphinx](https://www.sphinx-doc.org/en/master/).
136+
- Build and push [Docker image](https://docs.docker.com/get-started/docker-concepts/the-basics/what-is-an-image/).
137+
3. Be accepted by at least one other member.
138+
3. Ensure documentation using [Pythons docstrings](https://peps.python.org/pep-0257/) are up-to-date, following the [Numpydoc](https://numpydoc.readthedocs.io/en/latest/format.html) style.
139+
140+
This structure ensured a thorough yet simple template for creating one's implementation while adhering to the style.
141+
142+
143+
### Running others code
144+
145+
Generally, once the aforementioned requirements were set in stone and tests were implemented, other collaborators code were at such a high quality that using it was not a problem. The difficult part here is deciding on the common design choices, which we managed to do early on.
146+
147+
148+
### Having others run my code
149+
150+
As with the above conclusion, having a common ground to work from made the challenge much easier. However, upon deciding the style, there were a few disagreements to how the code should be written. But with majority voting, we were able to decide on solutions that everyone was happy with.
151+
152+
## Tooling
153+
154+
While Git and GitHub were familiar to me from before, GitHub Actions, documentation using Sphinx, GitHub Packages, and the [UV](https://astral.sh/blog/uv) package manager were new to me. GitHub Actions proved to be paramount for automated testing, ensuring quality in the `main` branch of the project, as well as keeping code readable using formatters. Having a documentation with Sphinx, proved to be beneficial when using another persons code, and not knowing the exact internals of their implementational choices. While most collaborators started the project using [miniconda](https://www.anaconda.com/docs/main), we decided to use UV as our _official_ package manager. While I have good experience with Docker, I had not used the [GitHub Container Registry (ghcr.io)](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry) before, which had the benefit of tying the container image up to the repository, and organization, instead of a single collaborator.

doc/conf.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,24 @@
11
project = "Collaborative Coding Exam"
22
copyright = "2025, SFI Visual Intelligence"
33
author = "SFI Visual Intelligence"
4-
release = "0.0.1"
4+
release = "1.1.0"
55

66
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
77

88
extensions = [
99
"myst_parser", # in order to use markdown
1010
"autoapi.extension", # in order to generate API documentation
11+
"sphinx.ext.mathjax", # in order to render math equations
1112
]
1213

1314
# search this directory for Python files
1415
autoapi_dirs = ["../CollaborativeCoding"]
1516

1617
myst_enable_extensions = [
1718
"colon_fence", # ::: can be used instead of ``` for better rendering
19+
"dollarmath", # $...$ can be used for math equations
1820
]
1921

2022
html_theme = "sphinx_rtd_theme"
23+
24+
html_css_files = ["custom.css"]

0 commit comments

Comments
 (0)