Skip to content

Commit 46e49f8

Browse files
committed
Some references in further work
1 parent 2965b72 commit 46e49f8

File tree

3 files changed

+64
-24
lines changed

3 files changed

+64
-24
lines changed

TODO.md

Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -7,38 +7,44 @@ Feedback needed
77

88
### Draft 3
99

10-
- Make missing images
11-
- Materials: Add images of compared models
12-
1310
Results
1411

1512
- Use Strided-DS-24 as chosen model (confusion matrix etc), instead of auto "best"
16-
- Finish basic Discussion and Conclusion
13+
- Finish Discussion and Conclusion
1714
- Make plots a bit prettier
18-
- Add picture of demo setup
19-
20-
Reprod
21-
22-
- Tag a branch for submitted thesis
23-
- Upload models to GH
24-
- Results are from git commit `b49efa5dde48f9fd72a32eff4c751d9d0c0de712`
25-
- Include perftools Python script in appendix?
2615

2716
Abstract
2817

2918
- Write it!
3019
- Send to OK for feedback
3120

21+
### Draft 4
22+
23+
Materials
24+
25+
- Materials: Add images of compared models
26+
27+
Background
28+
29+
- Make missing images
30+
3231
Report
3332

3433
- Add short captions to figures, so List of Figures looks nice
3534

36-
### Draft 4
37-
38-
Improve
35+
Results
3936

40-
- Plot performance of models relative to fold
4137
- Include error analysis
38+
- Plot performance of models relative to fold
39+
40+
Reprod
41+
42+
- Tag a branch for submitted thesis
43+
- Upload models to GH
44+
- Results are from git commit `b49efa5dde48f9fd72a32eff4c751d9d0c0de712`
45+
- Include perftools Python script in appendix?
46+
47+
### Draft 5
4248

4349
Add Acknowledgements?
4450

report/references.bib

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -835,6 +835,13 @@ @article{Mixup
835835
year={2017}
836836
}
837837

838+
@article{SpecAugment,
839+
title={SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition},
840+
author={Park, Daniel S and Chan, William and Zhang, Yu and Chiu, Chung-Cheng and Zoph, Barret and Cubuk, Ekin D and Le, Quoc V},
841+
journal={arXiv preprint arXiv:1904.08779},
842+
year={2019}
843+
}
844+
838845
@article{Cutout,
839846
title={Improved regularization of convolutional neural networks with cutout},
840847
author={DeVries, Terrance and Taylor, Graham W},
@@ -1353,3 +1360,27 @@ @online{ARMHeliumAnnouncement
13531360
url="https://www.arm.com/company/news/2019/02/next-generation-armv8-1-m-architecture",
13541361
}
13551362

1363+
1364+
@article{SemiSupervisedActiveLearning,
1365+
author = {Han, Wenjing AND Coutinho, Eduardo AND Ruan, Huabin AND Li, Haifeng AND Schuller, Björn AND Yu, Xiaojie AND Zhu, Xuan},
1366+
journal = {PLOS ONE},
1367+
publisher = {Public Library of Science},
1368+
title = {Semi-Supervised Active Learning for Sound Classification in Hybrid Learning Environments},
1369+
year = {2016},
1370+
month = {09},
1371+
volume = {11},
1372+
url = {https://doi.org/10.1371/journal.pone.0162075},
1373+
pages = {1-23},
1374+
number = {9},
1375+
doi = {10.1371/journal.pone.0162075}
1376+
}
1377+
1378+
@inproceedings{ActiveLearningSonyc,
1379+
title={Active Learning for Efficient Audio Annotation and Classification with a Large Amount of Unlabeled Data},
1380+
author={Wang, Yu and Mendez, Ana Elisa Mendez and Cartwright, Mark and Bello, Juan Pablo},
1381+
booktitle={ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
1382+
pages={880--884},
1383+
year={2019},
1384+
organization={IEEE}
1385+
}
1386+

report/report.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1518,9 +1518,11 @@ the model performance is evaluated on two simplified variations:
15181518

15191519
`TODO: table of group membership`
15201520

1521+
<!--
15211522
Also explored is the consequence of introduce an "unknown" class for low-confidence predictions:
15221523
Predictions where the highest probability is below a certain threshold
15231524
are assigned to the unknown class instead of the original 10 classes.
1525+
-->
15241526

15251527
The SystemPerformance application skeleton from X-CUBE-AI is used to record the
15261528
average inference time per sample on the STM32L476 microcontroller.
@@ -1666,11 +1668,11 @@ practical challenges with applying on-edge classification of noise in sensor net
16661668
Utilizing larger amounts of training data might
16671669
be able to increase performance of the models shown.
16681670
Possible techniques for this are transfer learning[@PretrainingSpeechCommandRecognition],
1669-
or applying stronger data augmentation techniques (such as Mixup).
1671+
or applying stronger data augmentation techniques (such as Mixup[Mixup] or SpecAugment[@SpecAugment]).
16701672

1671-
Applying quantization should make the computations of the models more efficient.
1673+
Applying quantization should speed up the computations of the models.
16721674
A first step would be to make use of the optimized CMSIS-NN library[@CMSIS-NN],
1673-
which utilizes 8-bit integer operations.
1675+
which utilizes 8-bit integer operations and the SIMD unit in the ARM Cortex M4F.
16741676
However there are also promising results showing that CNNs can be
16751677
effectively implemented with as little as 2 bits[@andri2016yodann][@miyashita2016convolutional][@IncrementalNetworkQuantization],
16761678
and without using any multiplications[@leng2018extremely][@cintra2018low].
@@ -1685,9 +1687,10 @@ since it allows also the filterbank processing to be offloaded from the general
16851687
-->
16861688

16871689
In a practical deployment of on-sensor classification, it is still desirable to
1688-
be able to collect *some* data for evaluation of performance and further training.
1689-
This could be sampled at random. But could it be more effective to use some sort of
1690-
adaptive sampling, possibly Active Learning?
1690+
collect *some* data for evaluation of performance and further training.
1691+
This could be sampled at random.
1692+
But can an on-sensor implementation Active Learning[@ActiveLearningSonyc][@SemiSupervisedActiveLearning]
1693+
make this process more efficient?
16911694

16921695
<!--
16931696
Normally such training and evaluation data is transferred as raw PCM audio,
@@ -1696,8 +1699,8 @@ Could low-power audio coding be applied to compress the data,
16961699
while still enable reliable human labeling and use as evaluation/training data?
16971700
-->
16981701

1699-
It is also very desirable to reduce how often classification is needed.
1700-
Could this benefit from an adaptive sampling strategy?
1702+
It is critical for power consumption to reduce how often on-sensor classification is performed.
1703+
This should also benefit from an adaptive sampling strategy.
17011704
For example to primarily do classification for time-periods which exceed
17021705
a sound level threshold, or to sample less often when the sound source changes slowly.
17031706

0 commit comments

Comments
 (0)