You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: report/report.md
+9-7Lines changed: 9 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -408,10 +408,11 @@ Neural Networks are trained through numerical optimization of an objective funct
408
408
For supervised learning the standard method is mini-batch Gradient Descent with Backpropagation.
409
409
410
410
For classification the cross-entropy (log loss) function is often applied.
411
-
As predicted probability of the true class gets close to zero, the (negative) log-loss goes towards infinity.
412
-
Figure \ref{fig:log-loss}
411
+
As predicted probability of the true class gets close to zero, the log-loss goes towards infinity.
412
+
This penalizes wrong predictions heavily, see Figure \ref{fig:log-loss}.
413
+
414
+
{ width=100% }
413
415
414
-
`TODO: picture of loss in binary cross entropy`
415
416
Categorical cross-entropy is an extension of binary cross-entropy to multiple classes.
416
417
Other loss functions are Logistic Loss, Mean Squared Error and Mean Absolute Error.
417
418
@@ -428,10 +429,9 @@ This is computed as the partial derivative of the function.
428
429
<!--
429
430
MAYBE: mention momentum
430
431
[@SaddlePointNeuralNetworks]
431
-
-->
432
-
433
432
434
433
`TODO: image of 1-D loss landscape and Gradient Descent`
434
+
-->
435
435
436
436
The key to calculating the gradients in a multi-layer neural networks
437
437
is *backpropagation*[@BackpropagationNeuralNetworks].
@@ -754,7 +754,7 @@ It however adds compression artifacts, and is best avoided for machine learning
754
754
Recordings can have multiple channels of audio but for machine learning on audio
755
755
single-channel data (mono-aural) is still common.
756
756
757
-
### Spectrograms
757
+
### Spectrogram
758
758
759
759
Sounds of interest often have characteristic patterns not just in time (temporal signature)
760
760
but also in frequency content (spectral signature).
@@ -780,8 +780,10 @@ For speech a typical choice of window length is 20 ms.
780
780
Similar frame lengths are often adopted for acoustic events.
781
781
The STFT returns complex numbers describing phase and magnitude of each frequency bin.
782
782
A spectrogram squaring the absolute of the magnitude, and discards the phase information.
783
+
This is called a *linear spectrogram* or sometimes just spectrogram.
783
784
The lack of phase information means that the spectrogram is not strictly invertible,
784
785
though estimations exist[@GriffinLimSpectrogramInversion][@MCNNSpectrogramInversion].
786
+
A linear spectrogram can be on top in Figure \ref{fig:spectrograms}.
785
787
786
788
### Mel-spectrogram
787
789
@@ -800,7 +802,7 @@ See Figure \ref{figure:filterbanks}.
800
802
The Mel scaled filters is commonly used for audio classification. <!-- TODO: reference -->
801
803
The spectrogram that results for applying a Mel-scale filter-bank is often called a Mel-spectrogram.
802
804
803
-
`TODO: image of mel-spectrogram`
805
+
{short-caption="Different spectrograms" width=100%}
804
806
805
807
Mel-Filter Cepstral Coefficients (MFCC) is a feature representation
806
808
computed by performing a Discrete Cosine Transform (DCT) on a mel-spectrogram.
0 commit comments