Skip to content

Commit 2965b72

Browse files
committed
report: Add demo image
1 parent 5aa63d1 commit 2965b72

File tree

2 files changed

+45
-18
lines changed

2 files changed

+45
-18
lines changed

report/img/demo-tightcrop.jpg

1.87 MB
Loading

report/report.md

Lines changed: 45 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -978,6 +978,15 @@ based on selecting and manually labeling content from the Freesound[@Freesound]
978978
1302 different recordings were annotated, for a total of 18.5 hours of labeled audio.
979979
A curated subset with 8732 audio clips of maximum 4 seconds is known as *Urbansound8k*.
980980

981+
\begin{table}[h]
982+
\centering
983+
\scalebox{0.8}{
984+
\input{pyincludes/urbansound8k-classes.tex}
985+
}
986+
\caption{Classes found in the Urbansound8k dataset}
987+
\label{table:urbansound8k-classes}
988+
\end{table}
989+
981990
YorNoise[@medhat2017masked] is a collection of vehicle noise.
982991
It has a total of 1527 samples, in two classes: road traffic (cars, trucks, buses) and rail (trains).
983992
The dataset follows the same design as Urbansound8k,
@@ -1256,18 +1265,15 @@ They claim a 16x improvement in power efficiency over a ARM Cortex M7 chip[@GAP8
12561265

12571266
## Dataset
12581267

1259-
The dataset used for the experiements is Urbansound8K, described in chapter \ref{chapter:datasets}.
1260-
The 10 classes in the dataset are listed in Table \ref{table:urbansound8k-classes},
1261-
and Figure \ref{figure:urbansound8k-examples} shows example audio spectrograms.
1262-
1263-
\begin{table}
1264-
\centering
1265-
\input{pyincludes/urbansound8k-classes.tex}
1266-
\caption{Classes found in the Urbansound8k dataset}
1267-
\label{table:urbansound8k-classes}
1268-
\end{table}
1268+
The dataset used for the experiments is Urbansound8K, described in chapter \ref{chapter:datasets}.
1269+
Figure \ref{figure:urbansound8k-examples} shows example audio spectrograms for each of the 10 classes.
12691270

1270-
![Spectrograms of sound clips from Urbansound8k dataset, selected for each class\label{figure:urbansound8k-examples}](./plots/urbansound8k-examples.png)
1271+
\begin{figure}[h]
1272+
\centering
1273+
\includegraphics{./plots/urbansound8k-examples.png}
1274+
\caption[Spectrograms from Urbansound8k dataset]{Spectrograms of sound clips from Urbansound8k dataset, selected for each class }
1275+
\label{figure:urbansound8k-examples}
1276+
\end{figure}
12711277

12721278
The dataset comes prearranged into 10 folds for cross-validation.
12731279
A single fold may contain multiple clips from the same source file,
@@ -1277,7 +1283,6 @@ The target sound is rarely alone in the sound clip, and may be in the background
12771283
partially obscured by sounds outside the available classes.
12781284
This makes Urbansound8k a relatively challenging dataset.
12791285

1280-
12811286
## Hardware platform
12821287

12831288
The microcontroller chosen for this thesis is the STM32L476[@STM32L476] from STMicroelectronics.
@@ -1503,9 +1508,9 @@ was bottlenecked by the CPU or SSD when preparing the batches.
15031508

15041509
Once training is completed, the model epoch with best performance on the validation set is selected
15051510
for each of the cross-validation folds.
1506-
The selected models are then evaluated on the test set.
1511+
The selected models are then evaluated on the test set in each fold.
15071512

1508-
In addition to the original Urbansound8k test set,
1513+
In addition to the standard cross-validation for Urbansound8k,
15091514
the model performance is evaluated on two simplified variations:
15101515

15111516
- Only clips where target sound is in the foreground
@@ -1527,9 +1532,14 @@ which would be ignored if only relying on the theoretical MACC number.
15271532
# Results
15281533

15291534

1535+
## Model comparisons
1536+
15301537
![Test accuracy of the different models](./results/models_accuracy.png){ height=30% }
15311538

1532-
\begin{table}
1539+
`FIXME: add std-dev to table`
1540+
`FIXME: sort table in same order as figure`
1541+
1542+
\begin{table}[h]
15331543
\centering
15341544
\input{pyincludes/results.tex}
15351545
\caption{Results for the compared models}
@@ -1540,17 +1550,34 @@ which would be ignored if only relying on the theoretical MACC number.
15401550

15411551
`FIXME: change confusion matrix color scale to show nuances in 0-20% range`
15421552

1553+
`TODO: plot MAC versus compute time`
1554+
1555+
## Error analysis
1556+
15431557
![Confusion matrix on Urbansound8k](./results/confusion_test.png){ height=30% }
15441558

15451559
![Confusion matrix in reduced groups with only foreground sounds](./results/grouped_confusion_test_foreground.png){ height=30% }
15461560

15471561
`TODO: add error analysis plots`
15481562

1549-
`TODO: plot MAC versus compute time`
15501563

1551-
`TODO: plot training curves over epochs`
1564+
<!-- MAYBE: plot training curves over epochs -->
1565+
1566+
## On-device testing
1567+
1568+
\begin{figure}[h]
1569+
\centering
1570+
\includegraphics[width=1.0\textwidth]{./img/demo-tightcrop.jpg}
1571+
\caption[Testing model on device]{Model being tested on device. Sound is played back via headphones and classified on the microcontroller. Predictions are sent to computer and visualized on screen in real-time. }
1572+
\label{figure:demo}
1573+
\end{figure}
15521574

1553-
`FIXME: add a picture of demo setup`
1575+
The on-device demonstration used the SENSING1 application example as base,
1576+
and modifications were made to send the predictions out over USB.
1577+
This example code only supports mel-spectrogram preprocessing with 16 kHz sample-rate, 30 filters
1578+
and 1024 samples FFT window with 512 hops, using max-normalization for the analysis windows.
1579+
A Strided-DS-5x5 model was trained on fold 1-8 to match these feature settings.
1580+
The model scored 72% on the associated validation-set, fold 9.
15541581

15551582

15561583
\newpage

0 commit comments

Comments
 (0)