Some references in further work

jonnor · jonnor · commit 46e49f85ab5d · 2019-05-13T13:48:52.000+02:00
diff --git a/TODO.md b/TODO.md
@@ -7,38 +7,44 @@ Feedback needed
 
 ### Draft 3
 
-- Make missing images
-- Materials: Add images of compared models
-
 Results
 
 - Use Strided-DS-24 as chosen model (confusion matrix etc), instead of auto "best"
-- Finish basic Discussion and Conclusion
+- Finish Discussion and Conclusion
 - Make plots a bit prettier
-- Add picture of demo setup
-
-Reprod
-
-- Tag a branch for submitted thesis
-- Upload models to GH
-- Results are from git commit `b49efa5dde48f9fd72a32eff4c751d9d0c0de712`
-- Include perftools Python script in appendix?
 
 Abstract
 
 - Write it!
 - Send to OK for feedback
 
+### Draft 4
+
+Materials
+
+- Materials: Add images of compared models
+
+Background
+
+- Make missing images
+
 Report
 
 - Add short captions to figures, so List of Figures looks nice
 
-### Draft 4
-
-Improve
+Results
 
-- Plot performance of models relative to fold
 - Include error analysis
+- Plot performance of models relative to fold
+
+Reprod
+
+- Tag a branch for submitted thesis
+- Upload models to GH
+- Results are from git commit `b49efa5dde48f9fd72a32eff4c751d9d0c0de712`
+- Include perftools Python script in appendix?
+
+### Draft 5
 
 Add Acknowledgements?
 
diff --git a/report/references.bib b/report/references.bib
@@ -835,6 +835,13 @@ @article{Mixup
   year={2017}
 }
 
+@article{SpecAugment,
+  title={SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition},
+  author={Park, Daniel S and Chan, William and Zhang, Yu and Chiu, Chung-Cheng and Zoph, Barret and Cubuk, Ekin D and Le, Quoc V},
+  journal={arXiv preprint arXiv:1904.08779},
+  year={2019}
+}
+
 @article{Cutout,
   title={Improved regularization of convolutional neural networks with cutout},
   author={DeVries, Terrance and Taylor, Graham W},
@@ -1353,3 +1360,27 @@ @online{ARMHeliumAnnouncement
     url="https://www.arm.com/company/news/2019/02/next-generation-armv8-1-m-architecture",
 }
 
+
+@article{SemiSupervisedActiveLearning,
+    author = {Han, Wenjing AND Coutinho, Eduardo AND Ruan, Huabin AND Li, Haifeng AND Schuller, Björn AND Yu, Xiaojie AND Zhu, Xuan},
+    journal = {PLOS ONE},
+    publisher = {Public Library of Science},
+    title = {Semi-Supervised Active Learning for Sound Classification in Hybrid Learning Environments},
+    year = {2016},
+    month = {09},
+    volume = {11},
+    url = {https://doi.org/10.1371/journal.pone.0162075},
+    pages = {1-23},
+    number = {9},
+    doi = {10.1371/journal.pone.0162075}
+}
+
+@inproceedings{ActiveLearningSonyc,
+  title={Active Learning for Efficient Audio Annotation and Classification with a Large Amount of Unlabeled Data},
+  author={Wang, Yu and Mendez, Ana Elisa Mendez and Cartwright, Mark and Bello, Juan Pablo},
+  booktitle={ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
+  pages={880--884},
+  year={2019},
+  organization={IEEE}
+}
+
diff --git a/report/report.md b/report/report.md
@@ -1518,9 +1518,11 @@ the model performance is evaluated on two simplified variations:
 
 `TODO: table of group membership`
 
+<!--
 Also explored is the consequence of introduce an "unknown" class for low-confidence predictions:
 Predictions where the highest probability is below a certain threshold
 are assigned to the unknown class instead of the original 10 classes.
+-->
 
 The SystemPerformance application skeleton from X-CUBE-AI is used to record the
 average inference time per sample on the STM32L476 microcontroller.
@@ -1666,11 +1668,11 @@ practical challenges with applying on-edge classification of noise in sensor net
 Utilizing larger amounts of training data might
 be able to increase performance of the models shown.
 Possible techniques for this are transfer learning[@PretrainingSpeechCommandRecognition],
-or applying stronger data augmentation techniques (such as Mixup).
+or applying stronger data augmentation techniques (such as Mixup[Mixup] or SpecAugment[@SpecAugment]).
 
-Applying quantization should make the computations of the models more efficient.
+Applying quantization should speed up the computations of the models.
 A first step would be to make use of the optimized CMSIS-NN library[@CMSIS-NN],
-which utilizes 8-bit integer operations. 
+which utilizes 8-bit integer operations and the SIMD unit in the ARM Cortex M4F.
 However there are also promising results showing that CNNs can be
 effectively implemented with as little as 2 bits[@andri2016yodann][@miyashita2016convolutional][@IncrementalNetworkQuantization],
 and without using any multiplications[@leng2018extremely][@cintra2018low].
@@ -1685,9 +1687,10 @@ since it allows also the filterbank processing to be offloaded from the general
 -->
 
 In a practical deployment of on-sensor classification, it is still desirable to
-be able to collect *some* data for evaluation of performance and further training.
-This could be sampled at random. But could it be more effective to use some sort of
-adaptive sampling, possibly Active Learning?
+collect *some* data for evaluation of performance and further training.
+This could be sampled at random.
+But can an on-sensor implementation Active Learning[@ActiveLearningSonyc][@SemiSupervisedActiveLearning]
+make this process more efficient?
 
 <!--
 Normally such training and evaluation data is transferred as raw PCM audio,
@@ -1696,8 +1699,8 @@ Could low-power audio coding be applied to compress the data,
 while still enable reliable human labeling and use as evaluation/training data?
 --> 
 
-It is also very desirable to reduce how often classification is needed.
-Could this benefit from an adaptive sampling strategy?
+It is critical for power consumption to reduce how often on-sensor classification is performed.
+This should also benefit from an adaptive sampling strategy.
 For example to primarily do classification for time-periods which exceed
 a sound level threshold, or to sample less often when the sound source changes slowly.