|
13 | 13 | "\n", |
14 | 14 | "# Data Driven Audio Signal Processing - A Tutorial with Computational Examples\n", |
15 | 15 | "\n", |
16 | | - "Winter Semester 2023/24 (Master Course #24512)\n", |
| 16 | + "Winter Semester 2025/26 (Master Course #24512)\n", |
17 | 17 | "\n", |
18 | 18 | "- lecture: https://github.com/spatialaudio/data-driven-audio-signal-processing-lecture\n", |
19 | 19 | "- tutorial: https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise\n", |
|
30 | 30 | }, |
31 | 31 | { |
32 | 32 | "cell_type": "markdown", |
33 | | - "metadata": {}, |
| 33 | + "metadata": { |
| 34 | + "vscode": { |
| 35 | + "languageId": "plaintext" |
| 36 | + } |
| 37 | + }, |
34 | 38 | "source": [ |
35 | | - "We introduce the topic and set general objectives for this tutorial. We have some thoughts on best engineering practices and discuss the established procedure for structured development of data-driven methods. Useful Python packages are stated. Exemplary machine learning based audio applications are briefly outlined." |
| 39 | + "## Mindset\n", |
| 40 | + "\n", |
| 41 | + "When and why machine learning?!\n", |
| 42 | + "\n", |
| 43 | + "ChatGPT Text Synthesis vs. Prediction Model for Exam Grades ?!" |
36 | 44 | ] |
37 | 45 | }, |
38 | 46 | { |
39 | 47 | "cell_type": "markdown", |
40 | 48 | "metadata": {}, |
41 | 49 | "source": [ |
42 | | - "## General Objective\n", |
| 50 | + "## Motivation: Binary Classification with a Non-Linear Model\n", |
43 | 51 | "\n", |
44 | | - "- For engineers **understanding the essence** of a concept is more important than a strict math proof\n", |
45 | | - " - as engineers we can leave proofs to mathematicians\n", |
46 | | - " - *example*: understanding the 4 matrix subspaces and the matrix (pseudo)-inverse based on the SVD is essential and need to know, in-depth proofs on this fundamental topic is nice to have\n", |
47 | | - "- We should \n", |
48 | | - " - understand building blocks of machine learning for audio data processing\n", |
49 | | - " - create simple tool chains from these building blocks\n", |
50 | | - " - create simple applications from these tool chains\n", |
51 | | - " - get an impression about real industrial applications and their algorithmic and data effort\n", |
52 | | - " - get in touch with scientific literature\n", |
53 | | - " - where to find, how to read\n", |
54 | | - " - there we will find latest tool chain inventions (if published at all, a lot of stuff is either unavailable due to company secrets, or only patent specifications exist, which usually omit heavy math and important details)\n", |
55 | | - " - interpretation of results\n", |
56 | | - " - reproducibility\n", |
57 | | - " - re-inventing a tool chain\n", |
58 | | - " - get in touch with major software libraries (in Python), see below" |
| 52 | + "<img src=\"BinaryClassification.png\" width=\"1200\">\n", |
| 53 | + "\n", |
| 54 | + "Binary Logistic Regression\n", |
| 55 | + "- [binary_logistic_regression_manual.ipynb](binary_logistic_regression_manual.ipynb)\n", |
| 56 | + "- [binary_logistic_regression_torch.ipynb](binary_logistic_regression_torch.ipynb)\n", |
| 57 | + "- [binary_logistic_regression_tensorflow.ipynb](binary_logistic_regression_tensorflow.ipynb)\n", |
| 58 | + "\n", |
| 59 | + "Binary Classification with Non-Linear Models\n", |
| 60 | + "- [binary_logistic_regression_torch_with_hidden_layers.ipynb](binary_logistic_regression_torch_with_hidden_layers.ipynb) (above plot is created with this code)\n", |
| 61 | + "- [binary_logistic_regression_tf_with_hidden_layers.ipynb](binary_logistic_regression_tf_with_hidden_layers.ipynb)" |
59 | 62 | ] |
60 | 63 | }, |
61 | 64 | { |
62 | 65 | "cell_type": "markdown", |
63 | 66 | "metadata": {}, |
64 | 67 | "source": [ |
| 68 | + "## TensorFLow Playground\n", |
| 69 | + "- https://playground.tensorflow.org" |
| 70 | + ] |
| 71 | + }, |
| 72 | + { |
| 73 | + "cell_type": "markdown", |
| 74 | + "metadata": {}, |
| 75 | + "source": [ |
| 76 | + "## Machine Learning Ingredients\n", |
| 77 | + "- **Human Intelligence and Creativity**\n", |
| 78 | + "- Vector Calculus / Analysis\n", |
| 79 | + "- Matrix Calculus / Linear Algebra\n", |
| 80 | + "- Statistics\n", |
| 81 | + "- Signal Processing\n", |
| 82 | + "- Optimisation\n", |
| 83 | + "- Programming (Python!)\n", |
| 84 | + "- Data Handling" |
| 85 | + ] |
| 86 | + }, |
| 87 | + { |
| 88 | + "cell_type": "markdown", |
| 89 | + "metadata": {}, |
| 90 | + "source": [ |
| 91 | + "## IEF BinderHub\n", |
| 92 | + "- virtual machine storage is lost when virtual machine is abandoned\n", |
| 93 | + "- persistent storage only at `mnt/home`\n", |
| 94 | + "- File -> New Terminal\n", |
| 95 | + "- `cd mnt/home`\n", |
| 96 | + "- we can clone the tutorial material into persistent storage by:\n", |
| 97 | + "- `git clone https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise.git`\n", |
| 98 | + "- the same is possible for the lecture material:\n", |
| 99 | + "- `git clone https://github.com/spatialaudio/data-driven-audio-signal-processing-lecture`\n", |
| 100 | + "\n", |
65 | 101 | "## Useful Python Packages\n", |
66 | 102 | "\n", |
67 | | - "- `numpy` for matrix/tensor algebra\n", |
68 | | - "- `scipy` for important science math stuff\n", |
| 103 | + "- `numpy` for matrix / tensor linear algebra\n", |
| 104 | + "- `scipy` for important scientific math stuff\n", |
69 | 105 | "- `matplotlib` for plotting\n", |
70 | 106 | "- `scikit-learn` for predictive data analysis, machine learning\n", |
71 | 107 | "- `statsmodels` statistic models, i.e. machine learning driven from statistics community\n", |
72 | 108 | "- `tensorflow` deep learning with DNNs, CNNs...\n", |
73 | | - "- `keras-tuner` for convenient hyper parameter tuning\n", |
74 | | - "- `pytorch` deep learning with DNNs, CNNs...audio handling\n", |
| 109 | + "- `keras-tuner` for convenient hyper parameter tuning in tensorflow\n", |
| 110 | + "- `torch` deep learning with DNNs, CNNs...audio handling\n", |
75 | 111 | "- `pandas` for data handling\n", |
76 | 112 | "\n", |
77 | 113 | "audio related packages that we might use here and there\n", |
78 | 114 | "- `librosa`+`ffmpeg` music/audio analysis + en-/decoding/stream support\n", |
79 | | - "- pip:\n", |
80 | | - " - sounddevice\n", |
81 | | - " - soundfile\n", |
82 | | - " - pyloudnorm" |
| 115 | + "- `soundfile` for read and write audio file\n", |
| 116 | + "- `pyloudnorm`to calculate a technical loudness measure" |
| 117 | + ] |
| 118 | + }, |
| 119 | + { |
| 120 | + "cell_type": "markdown", |
| 121 | + "metadata": {}, |
| 122 | + "source": [ |
| 123 | + "## Most Recommended Books\n", |
| 124 | + "- Gilbert Strang *Linear Algebra and Learning From Data*, Wellesley, 2019\n", |
| 125 | + "- Kevin P. Murphy *Probabilistic Machine Learning: An Introduction*, MIT Press, 2022, free draft of most current version at https://probml.github.io/pml-book/book1.html\n", |
| 126 | + "- Sebastian Raschka *Machine Learning with PyTorch and Scikit-Learn*, Packt, 2022, https://www.packtpub.com/en-us/product/machine-learning-with-pytorch-and-scikit-learn-9781801819312\n", |
| 127 | + "\n", |
| 128 | + "Please do not learn from AI-written books! There are more textbook recommendations at the end of [index.ipynb](index.ipynb)." |
| 129 | + ] |
| 130 | + }, |
| 131 | + { |
| 132 | + "cell_type": "markdown", |
| 133 | + "metadata": {}, |
| 134 | + "source": [ |
| 135 | + "## Homework Assignment\n", |
| 136 | + "Learning and thus improving our own skills is related to do things ourselves and manually.\n", |
| 137 | + "We should read text books, we need to use our brain!\n", |
| 138 | + "Consuming ChatGPT or comparable tools is the wrong approach to learn and comprehend, because we never know if these model tell the truth.\n", |
| 139 | + "\n", |
| 140 | + "We go for a manual, human solution on these two tasks\n", |
| 141 | + "\n", |
| 142 | + "1. Matrix Fundamentals\n", |
| 143 | + "- in StudIP `MatrixFundamentals.pdf`\n", |
| 144 | + "2. Regression with a Neural Network Model\n", |
| 145 | + "- in StudIP `RegressionWithNonLinearModel_Task.pdf`\n", |
| 146 | + "- hopefully helpful template to start with [homework/homework_template.ipynb](homework/homework_template.ipynb)\n", |
| 147 | + "\n", |
| 148 | + "It might be more painful in the beginning, but it is rewarding by orders of magnitudes compared to a cheated ChatGPT solution." |
| 149 | + ] |
| 150 | + }, |
| 151 | + { |
| 152 | + "cell_type": "markdown", |
| 153 | + "metadata": {}, |
| 154 | + "source": [ |
| 155 | + "## Linear Models vs. Non-Linear Models\n", |
| 156 | + "\n", |
| 157 | + "### Linear Model\n", |
| 158 | + "Forward Problem\n", |
| 159 | + "$$\\bm{y} = \\bm{X} \\bm{\\theta}$$\n", |
| 160 | + "Inverse Problem\n", |
| 161 | + "$$\\hat{\\bm{\\theta}} = \\bm{X}^{-1} \\bm{y},\\qquad\n", |
| 162 | + "\\hat{\\bm{\\theta}} = \\bm{X}^{\\dagger} \\bm{y}\n", |
| 163 | + "$$\n", |
| 164 | + "\n", |
| 165 | + "### Non-Linear Model\n", |
| 166 | + "Forward Problem\n", |
| 167 | + "$$\\bm{y} = f_3(f_2(f_1(\\bm{X},\\bm{\\theta}_1),\\bm{\\theta}_2), \\bm{\\theta}_3)$$\n", |
| 168 | + "How to solve the inverse problem???\n" |
| 169 | + ] |
| 170 | + }, |
| 171 | + { |
| 172 | + "cell_type": "markdown", |
| 173 | + "metadata": {}, |
| 174 | + "source": [ |
| 175 | + "## Didactic Story\n", |
| 176 | + "- First, we should familiarise ourselves with all the ingredients of machine learning using only linear models. This requires a fair understand of matrix calulus and linear algebra.\n", |
| 177 | + "- Then, we can move on to non-linear models, since this extension involves very few changes to key concepts and mindsets.\n", |
| 178 | + "- The binary logistic regression is a perfect model to initially learn how non-linear models work.\n", |
| 179 | + "- Small non-linear models (such as the regression model from the homework task and the small binary classification models in this tutorial) are perfectly suited to implement them manually.\n", |
| 180 | + "\n", |
| 181 | + "Hence:" |
| 182 | + ] |
| 183 | + }, |
| 184 | + { |
| 185 | + "cell_type": "markdown", |
| 186 | + "metadata": {}, |
| 187 | + "source": [ |
| 188 | + "\n", |
| 189 | + "## General Objective\n", |
| 190 | + "\n", |
| 191 | + "- For engineers **understanding the essence** of a concept is more important than a strict math proof\n", |
| 192 | + " - as engineers we can leave proofs to mathematicians\n", |
| 193 | + " - *example*: understanding the 4 matrix subspaces and the matrix (pseudo)-inverse based on the SVD is essential and need to know, in-depth proofs on this fundamental topic is nice to have\n", |
| 194 | + "- We should \n", |
| 195 | + " - understand building blocks of machine learning for (audio) data processing\n", |
| 196 | + " - create simple tool chains from these building blocks\n", |
| 197 | + " - create simple applications from these tool chains\n", |
| 198 | + " - get an impression about real industrial applications and their algorithmic and data effort\n", |
| 199 | + " - get in touch with scientific literature\n", |
| 200 | + " - where to find, how to read\n", |
| 201 | + " - there we will find latest tool chain inventions (if published at all, a lot of stuff is either unavailable due to company secrets, or only patent specifications exist, which usually omit heavy math and important details)\n", |
| 202 | + " - interpretation of results\n", |
| 203 | + " - reproducibility\n", |
| 204 | + " - re-inventing a tool chain\n", |
| 205 | + " - get in touch with major software libraries (in Python), see above" |
83 | 206 | ] |
84 | 207 | }, |
85 | 208 | { |
|
116 | 239 | "5. Evaluation and reporting\n", |
117 | 240 | "6. Application\n", |
118 | 241 | "\n", |
119 | | - "If we lack on thinking about 1. and 2., we will almost certainly under-perform in 3. and 4., which directly affects 5. and 6. Thus, we really should take the whole chain seriously. We hopefully do this all the time in the lecture and exercise." |
120 | | - ] |
121 | | - }, |
122 | | - { |
123 | | - "cell_type": "markdown", |
124 | | - "metadata": {}, |
125 | | - "source": [ |
126 | | - "## Applications for Machine Learning in Audio\n", |
127 | | - "\n", |
128 | | - "Some examples for applications are given below. Nowadays industrial applications use a combination of different ML techniques to provide an intended consumer service. \n", |
129 | | - "\n", |
130 | | - "- supervised learning (mostly prediction by clustering / regression)\n", |
131 | | - " - query by humming\n", |
132 | | - " - music/genre recognition & recommendation\n", |
133 | | - " - speech recognition\n", |
134 | | - " - disease prediction by sound analysis of breathing / coughing \n", |
135 | | - " - acoustic surveillance of machines (cd. keyboard noise to text?!)\n", |
136 | | - " - gun shot / alert sound detection\n", |
137 | | - " - beam forming / direction of arrival (DOA)\n", |
138 | | - " - composing (cf. Beethoven Symphony Nr. 10)\n", |
139 | | - " - deep audio fakes (human-made vs. machine-made replica)\n", |
140 | | - " - Auto EQ (mix should sound as reference mix?!)\n", |
141 | | - "- unsupervised learning (mostly clustering, dimensionality reduction)\n", |
142 | | - " - noise reduction\n", |
143 | | - " - echo cancellation\n", |
144 | | - " - feedback cancellation\n", |
145 | | - " - speech / language recognition\n", |
146 | | - " - compression\n", |
147 | | - " - feature creation (typical spectrum of pop music, classical...)\n", |
148 | | - " - feature calculation (perceived loudness, cf. replay gain adaption) \n", |
149 | | - " - key recognition\n", |
150 | | - "- reinforcement learning\n", |
151 | | - " - human tasks: how to compose a hit single, how to mix a hit single" |
152 | | - ] |
153 | | - }, |
154 | | - { |
155 | | - "cell_type": "markdown", |
156 | | - "metadata": {}, |
157 | | - "source": [ |
158 | | - "## Ideas for Student Projects\n", |
| 242 | + "If we lack on thinking about 1. and 2., we will almost certainly under-perform in 3. and 4., which directly affects 5. and 6.\n", |
159 | 243 | "\n", |
160 | | - "- song recognition (recognize a song out of a data base)\n", |
161 | | - "- key recognition (recognize the key a song is written in)\n", |
162 | | - "- chord recognition (recognize simple chords and chord progressions)\n", |
163 | | - "- de-noising (reduce noise in audio material, for example to improve speech intelligibility)\n", |
164 | | - "- genre classification and recommendation service" |
| 244 | + "Thus, we really should take the whole procedure chain seriously. We hopefully do this all the time in the lecture and exercise." |
165 | 245 | ] |
166 | 246 | }, |
167 | 247 | { |
|
0 commit comments