Skip to content

Commit 5d47978

Browse files
committed
report: Last images
1 parent b1fcff6 commit 5d47978

File tree

6 files changed

+13146
-11
lines changed

6 files changed

+13146
-11
lines changed

TODO.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,6 @@
33

44
### Final 1
55

6-
Background
7-
8-
- Make missing images
9-
106
Checking
117

128
- Do a spell checking pass

report/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ includes: pyincludes/urbansound8k-classes.tex \
2121
pyincludes/experiment-settings.tex \
2222
pyincludes/models.tex \
2323
pyincludes/results.tex \
24+
pyplots/logloss.png \
2425
pyplots/filterbanks.png \
2526
pyplots/dataaugmentations.png \
2627
plots/urbansound8k-examples.png \

report/img/spectrograms.svg

Lines changed: 13106 additions & 0 deletions
Loading

report/pyplots/logloss.png

16.9 KB
Loading

report/pyplots/logloss.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
import numpy
2+
from matplotlib import pyplot as plt
3+
from sklearn.metrics import log_loss
4+
5+
def plot_logloss(figsize=(6, 3)):
6+
fig, ax = plt.subplots(1, figsize=figsize)
7+
8+
yhat = numpy.linspace(0.0, 1.0, 300)
9+
losses_0 = [log_loss([0], [x], labels=[0,1]) for x in yhat]
10+
losses_1 = [log_loss([1], [x], labels=[0,1]) for x in yhat]
11+
12+
ax.plot(yhat, losses_0, label='true=0')
13+
ax.plot(yhat, losses_1, label='true=1')
14+
ax.legend()
15+
16+
ax.set_ylim(0, 8)
17+
ax.set_xlim(0, 1)
18+
19+
return fig
20+
21+
def main():
22+
fig = plot_logloss()
23+
fig.tight_layout()
24+
out = (__file__).replace('.py', '.png')
25+
fig.savefig(out, bbox_inches='tight')
26+
27+
if __name__ == '__main__':
28+
main()
29+
30+

report/report.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -408,10 +408,11 @@ Neural Networks are trained through numerical optimization of an objective funct
408408
For supervised learning the standard method is mini-batch Gradient Descent with Backpropagation.
409409

410410
For classification the cross-entropy (log loss) function is often applied.
411-
As predicted probability of the true class gets close to zero, the (negative) log-loss goes towards infinity.
412-
Figure \ref{fig:log-loss}
411+
As predicted probability of the true class gets close to zero, the log-loss goes towards infinity.
412+
This penalizes wrong predictions heavily, see Figure \ref{fig:log-loss}.
413+
414+
![Plot of log-loss for binary classification. \label{fig:log-loss}](./pyplots/logloss.png){ width=100% }
413415

414-
`TODO: picture of loss in binary cross entropy`
415416
Categorical cross-entropy is an extension of binary cross-entropy to multiple classes.
416417
Other loss functions are Logistic Loss, Mean Squared Error and Mean Absolute Error.
417418

@@ -428,10 +429,9 @@ This is computed as the partial derivative of the function.
428429
<!--
429430
MAYBE: mention momentum
430431
[@SaddlePointNeuralNetworks]
431-
-->
432-
433432
434433
`TODO: image of 1-D loss landscape and Gradient Descent`
434+
-->
435435

436436
The key to calculating the gradients in a multi-layer neural networks
437437
is *backpropagation*[@BackpropagationNeuralNetworks].
@@ -754,7 +754,7 @@ It however adds compression artifacts, and is best avoided for machine learning
754754
Recordings can have multiple channels of audio but for machine learning on audio
755755
single-channel data (mono-aural) is still common.
756756

757-
### Spectrograms
757+
### Spectrogram
758758

759759
Sounds of interest often have characteristic patterns not just in time (temporal signature)
760760
but also in frequency content (spectral signature).
@@ -780,8 +780,10 @@ For speech a typical choice of window length is 20 ms.
780780
Similar frame lengths are often adopted for acoustic events.
781781
The STFT returns complex numbers describing phase and magnitude of each frequency bin.
782782
A spectrogram squaring the absolute of the magnitude, and discards the phase information.
783+
This is called a *linear spectrogram* or sometimes just spectrogram.
783784
The lack of phase information means that the spectrogram is not strictly invertible,
784785
though estimations exist[@GriffinLimSpectrogramInversion][@MCNNSpectrogramInversion].
786+
A linear spectrogram can be on top in Figure \ref{fig:spectrograms}.
785787

786788
### Mel-spectrogram
787789

@@ -800,7 +802,7 @@ See Figure \ref{figure:filterbanks}.
800802
The Mel scaled filters is commonly used for audio classification. <!-- TODO: reference -->
801803
The spectrogram that results for applying a Mel-scale filter-bank is often called a Mel-spectrogram.
802804

803-
`TODO: image of mel-spectrogram`
805+
![Different spectrograms showing birdsong. Top: Linear spectrogram. Middle: Mel-spectrogram. Bottom: Normalized mel-spectrogram after mean-subtraction and standard scaling. The Mel-spectrograms in this example had the first filter set to 1kHz, eliminating a lot of the low frequency noise seen in the linear spectrogram.](./img/spectrograms.svg){short-caption="Different spectrograms" width=100%}
804806

805807
Mel-Filter Cepstral Coefficients (MFCC) is a feature representation
806808
computed by performing a Discrete Cosine Transform (DCT) on a mel-spectrogram.

0 commit comments

Comments
 (0)