Skip to content

Commit c07931b

Browse files
authored
Merge pull request #120 from foss-for-synopsys-dwc-arc-processors/kws_example
Kws example
2 parents 4b6c6ee + bdf970b commit c07931b

19 files changed

+3409
-0
lines changed
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
#
2+
# Copyright 2019, Synopsys, Inc.
3+
# All rights reserved.
4+
#
5+
# This source code is licensed under the BSD-3-Clause license found in
6+
# the LICENSE file in the root directory of this source tree.
7+
#
8+
9+
# default toolchain
10+
TOOLCHAIN ?= mwdt
11+
12+
# default hardware config file
13+
TCF_FILE ?= ../../hw/em9d.tcf
14+
15+
# Directories and files
16+
EMBARC_MLI_DIR ?= ../..
17+
SRC_DIRS = $(EMBARC_MLI_DIR)/examples/example_kws_speech \
18+
$(EMBARC_MLI_DIR)/examples/example_kws_speech/audio_features \
19+
$(EMBARC_MLI_DIR)/examples/example_kws_speech/kws \
20+
$(EMBARC_MLI_DIR)/examples/example_kws_speech/kws/dsconv_lstm_nn \
21+
$(EMBARC_MLI_DIR)/examples/auxiliary \
22+
23+
24+
INC_DIRS = $(EMBARC_MLI_DIR)/examples/example_kws_speech \
25+
$(EMBARC_MLI_DIR)/examples/example_kws_speech/audio_features \
26+
$(EMBARC_MLI_DIR)/examples/example_kws_speech/kws \
27+
$(EMBARC_MLI_DIR)/examples/example_kws_speech/kws/dsconv_lstm_nn \
28+
$(EMBARC_MLI_DIR)/include \
29+
$(EMBARC_MLI_DIR)/examples/auxiliary
30+
31+
EXT_LIBS_DIR ?= $(EMBARC_MLI_DIR)/bin
32+
EXT_LIBS ?= $(EXT_LIBS_DIR)/libmli.a
33+
OUT_DIR ?= ./bin
34+
BUILD_DIR ?= ./obj
35+
OUT_NAME ?= example_kws_stream
36+
ifeq ($(TOOLCHAIN),mwdt)
37+
# MWDT specific options
38+
CFLAGS = -Hnocopyr -Hpurge -Hheap=4K -Hstack=4K -Hdsplib -Hfxapi -e_start -Bgrouplib -Hldopt=-q -Hsdata0 -Xdsp_ctrl=postshift,guard,convergent -Hdense_prologue
39+
40+
ifneq ($(RT_LIB),)
41+
CFLAGS += -Hlib=$(RT_LIB)
42+
endif
43+
44+
else
45+
PREBUILT_LIB ?= $(EMBARC_MLI_DIR)/examples/prebuilt/libmli.a
46+
47+
# GNU toolchain specific options - correct it according to your target platform settings (see build_configuration.txt for input)
48+
#Iot DevKit config
49+
CFLAGS = -mcpu=em4_fpuda -mlittle-endian -mcode-density -mdiv-rem -mswap -mnorm -mmpy-option=6 -mbarrel-shifter -mxy
50+
51+
# The embARC MLI Library specific options it according to your target platform settings
52+
#(EM5D or EM7D platform)
53+
#CFLAGS += -DV2DSP
54+
#(EM9D or EM11D platform)
55+
CFLAGS += -DV2DSP_XY
56+
#(HS45D or HS47D platform)
57+
#CFLAGS += -DV2DSP_WIDE
58+
59+
# GNU toolchain linker specific options
60+
LDFLAGS = --defsym=__DEFAULT_HEAP_SIZE=8k
61+
LDFLAGS += --defsym=__DEFAULT_STACK_SIZE=1k
62+
LDFLAGS += -Map $(OUT_DIR)/$(OUT_NAME).map
63+
64+
#specific options for run the example with the MetaWare Debuger on the nSim simulator.
65+
DBG_OPTS = -cmd="read mdb_com_gnu"
66+
endif
67+
68+
.PHONY: clean all lib cleanall app
69+
.DEFAULT_GOAL := all
70+
71+
all: lib app
72+
73+
$(EXT_LIBS): $(EXT_LIBS_DIR)
74+
@echo Copy prebuilt library $(PREBUILT_LIB) to $@
75+
@$(CP) $(call fix_platform_path,$(PREBUILT_LIB)) $(call fix_platform_path,$@)
76+
77+
$(EXT_LIBS_DIR):
78+
$(MKDIR) $(call fix_platform_path, $@ )
79+
80+
include $(EMBARC_MLI_DIR)/build/rules.mk
81+
82+
ifeq ($(TOOLCHAIN),mwdt)
83+
lib:
84+
@ $(MAKE) generic_lib -C $(EMBARC_MLI_DIR)$(PS)lib$(PS)make$(PS) TCF_FILE="$(TCF_FILE)"
85+
else
86+
lib: $(EXT_LIBS)
87+
endif
88+
89+
app: generic_app
90+
91+
run: generic_run
92+
93+
clean:
94+
@echo Cleaning application $(OUT_NAME)...
95+
-@$(RM) $(call fix_platform_path,$(OBJS))
96+
97+
cleanall: clean
98+
@echo Cleaning all files ..
99+
-@$(RM) $(call fix_platform_path,$(OUT_DIR)/$(OUT_NAME).elf)
100+
-@$(RM) $(call fix_platform_path,$(OUT_DIR)/$(OUT_NAME).map)
101+
+$(MAKE) clean -C $(EMBARC_MLI_DIR)$(PS)lib$(PS)make$(PS)
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
Key Word Spotting Example
2+
==============================================
3+
Example shows implementation of small speech recognition use case for Key Word Spotting (KWS). [TensorFlow speech commands tutorial](https://www.tensorflow.org/tutorials/sequences/audio_recognition) was used as basis for neural network model training with the following most notable changes:
4+
1) Input features: FBANK features instead of MFCC
5+
2) Model architecture: Depthwise separable convolutions + LSTM Cell.
6+
KWS Modules are designed to process audio stream. The rest example code wraps modules to process single WAV file and output performance measurements.
7+
8+
9+
10+
Quick Start
11+
--------------
12+
13+
Example supports building with [MetaWare Development tools](https://www.synopsys.com/dw/ipdir.php?ds=sw_metaware) and running with MetaWare Debugger on [nSim simulator](https://www.synopsys.com/dw/ipdir.php?ds=sim_nSIM). Building with ARC GNU Toolchain isn’t supported yet.
14+
15+
### Build with MetaWare Development tools
16+
17+
Here we will consider building for [/hw/em9d.tcf](/hw/em9d.tcf) template. This template is a default template for this example. Other templates can be also used.
18+
19+
0. embARC MLI Library must be built for required hardware configuration first. See [embARC MLI Library building and quick start](/README.md#building-and-quick-start).
20+
21+
1. Open command line and change working directory to `./examples/example_kws_speech`
22+
23+
2. Example uses functions from DSP library and float point calculations for audio processing. For optimal performance C runtime libraries should be rebuilt for the target hardware template. Enter the following command in console (building may take 10-15 minutes):
24+
25+
buildlib em9d_rt_libs -bd ./ -tcf="../../hw/em9d.tcf"
26+
27+
2. Clean previous example build artifacts (optional):
28+
29+
gmake clean
30+
31+
3. Build example with pointing the runtime libraries which had been built on step 2:
32+
33+
gmake TCF_FILE=../../hw/em9d.tcf RT_LIB=./em9d_rt_libs
34+
35+
### Run example with MetaWare Debuger on nSim simulator.
36+
37+
Example application requires path to wav file (16kHz sample rate, 16bit samples, mono). Test sample inside the example directory might be used:
38+
39+
gmake run TCF_FILE=../../hw/em9d.tcf RUN_ARGS=./test.wav
40+
41+
Expected console output:
42+
43+
0.240 1.215 "on"(99.994%)
44+
1.440 2.415 "stop"(99.997%)
45+
3.600 4.575 "_unknown_"(99.960%)
46+
47+
The first and the second columns are start and end timestamps in seconds correspondingly. The third column is a detected command with its probability in brackets. To see full list of commands see the corresponding [module explanation](#kws-modules-explanation).
48+
49+
### Build with ARC GNU toolchain
50+
Build with ARC GNU Toolchain isn’t supported now.
51+
52+
53+
Example Structure
54+
--------------------
55+
Structure of the example application may be logically divided to next parts:
56+
57+
* **Application.** Implements allocation of fast memory and modules initialization. Input WAV file reading and KWS Module feeding with samples also related to this part.
58+
* main.cc
59+
* wav_file_guard.cc
60+
* wav_file_guard.h
61+
* **KWS Results postprocessor.** Assuming KWS modules outputs arrays with probabilities each Nth input frame, this code is intended to analyze result series and provide final decision.
62+
* simple_kws_postprocessor.cc
63+
* simple_kws_postprocessor.h
64+
* **Audio Features extractor.** Transformation of input audio waveform to more compact feature set which are easier for further analysis.
65+
* audio_features/audio_features.cc
66+
* audio_features/audio_features.h
67+
* **Common interface for KWS Modules.** Common interface declaration for KWS modules (including data types and module builder).
68+
* kws/kws_factory.h
69+
* kws/kws_module.h
70+
* kws/kws_types.h
71+
* **KWS modules implementation.** Implements logic of common KWS interface for specific NN graph. Uses embARC MLI Library for performing NN part of processing.
72+
* dsconv_lstm_nn/*
73+
74+
Example structure contains test WAV file which consist of 3 samples of Google speech dataset.
75+
* test.wav
76+
77+
More Options on Building and Running
78+
---------------------------------------
79+
To see profiling measures in console output (cycles count and memory requirements) pass the “-info” option after wav file path to application:
80+
81+
gmake run TCF_FILE=../../hw/em9d.tcf RUN_ARGS=”./test.wav -info”
82+
83+
84+
KWS Modules explanation
85+
----------------------------
86+
### Depthwise-Separable convolution with LSTM backend layer (kws/dsconv_lstm_nn)
87+
88+
In short:
89+
* Module distinguish 10 key words (standard TF tutorial set including yes, no, up, and etc.) plus “silence” and “unknown”
90+
* Filter Banks (FBANKs) features extracted from input audio frames as input for neural network
91+
* Layer 1: Convolution and Leaky ReLU activation (slope = 0.2)
92+
* Layers 2-4: Depthwise + pointwise convolution with Leaky ReLU activation (slope = 0.2)
93+
* Average pooling across frequency dimension after layer 4
94+
* Layer 5: LSTM sequential processing across time dimension
95+
* Layer 6: Fully connected + softmax.
96+
97+
List of commands:
98+
+ "yes"
99+
+ "no"
100+
+ "up"
101+
+ "down"
102+
+ "left"
103+
+ "right"
104+
+ "on"
105+
+ "off"
106+
+ "stop"
107+
+ "go"
108+
109+
All the rest commands beside the list are marked as “\_unknown\_
110+
111+
112+
Module takes mono input 16bit PCM audio stream of 16kHz sample rate by frames of 15ms length (240 samples – stride size). FBANK Feature are calculated from 30ms long frame extracting 13 values from each frame.
113+
114+
When input features sequence is long enough to perform complete NN Inference (features for 65 frames), module invokes MLI based NN implementation of the following architecture:
115+
![DSCNN_LSTM_NN](kws/dsconv_lstm_nn/DSCONV_LSTM_NN.png)
116+
117+
The following table provides accuracy performance of trained model and it’s MLI based version for various bit depth of quantized data:
118+
119+
120+
| | TF Baseline (Float) | MLI FX16 | MLI FX8 | MLI <br/> DSCNN: FX8 <br/> LSTM: FX8w16d |
121+
| :----------------------------------------------------: | :-------------------: | :-----------------: | :--------------------: | :----------------------------------------: |
122+
| Test dataset accuracy <br/> [diff with baseline] | 95.89% | 95.89% <br/> [==] | 94.826% <br/> [-1.06%] | 95.481% <br/> [-0.41%] |
123+
124+
125+
Version that corresponds to the last column is implemented in the module. To keep trained weights compact, all of them has been quantized to 8bit signed values. To minimize buffers for intermediate result, inputs and outputs of first 4 layers (convolutions) are also of 8bit depth. Activations of last convolution are transformed to 16bit depth to reduce the effect of the error accumulation in the recurrent LSTM cell.
126+
127+
128+
129+
References
130+
----------------------------
131+
Simple Audio Recognition Tutorial:
132+
> TensorFlow - Simple Audio Recognition. https://www.tensorflow.org/tutorials/sequences/audio_recognition
133+
134+
Speech commands dataset:
135+
> Pete Warden. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209, 2018
136+
137+
Explanation on FBANKs and MFCC Features:
138+
> Haytham Fayek. Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What's In-Between. Blog post. https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
139+
140+

0 commit comments

Comments
 (0)