Skip to content
This repository was archived by the owner on Jul 18, 2025. It is now read-only.

Commit 61e9a31

Browse files
author
idan-arm
committed
tiny-wav2letter
1 parent b9e26e6 commit 61e9a31

File tree

63 files changed

+4724
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+4724
-1
lines changed

README.md

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -361,9 +361,29 @@
361361
<td align="center">:heavy_check_mark: </td>
362362
<td align="center">0.0783</td>
363363
</tr>
364+
<tr>
365+
<td><a href="models/speech_recognition/tiny_wav2letter/tflite_int8">Tiny Wav2letter INT8 *</a></td>
366+
<td align="center">INT8</td>
367+
<td align="center">TensorFlow Lite</td>
368+
<td align="center">:heavy_check_mark: </td>
369+
<td align="center">:heavy_check_mark: </td>
370+
<td align="center">:heavy_multiplication_x: </td>
371+
<td align="center">:heavy_check_mark: </td>
372+
<td align="center">0.0348</td>
373+
</tr>
374+
<tr>
375+
<td><a href="models/speech_recognition/tiny_wav2letter/tflite_pruned_int8">Tiny Wav2letter Pruned INT8 *</a></td>
376+
<td align="center">INT8</td>
377+
<td align="center">TensorFlow Lite</td>
378+
<td align="center">:heavy_check_mark: </td>
379+
<td align="center">:heavy_check_mark: </td>
380+
<td align="center">:heavy_multiplication_x: </td>
381+
<td align="center">:heavy_check_mark: </td>
382+
<td align="center">0.0283</td>
383+
</tr>
364384
</table>
365385

366-
**Dataset**: LibriSpeech
386+
**Dataset**: LibriSpeech, Fluent Speech
367387

368388
## Superresolution
369389

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# Tiny Wav2letter INT8
2+
3+
## Description
4+
Tiny Wav2letter is a tiny version of the original Wav2Letter model. It is a convolutional speech recognition neural network. This implementation was created by Arm, pruned to 50% sparsity, fine-tuned and quantized using the TensorFlow Model Optimization Toolkit.
5+
6+
7+
8+
## License
9+
[Apache-2.0](https://spdx.org/licenses/Apache-2.0.html)
10+
11+
## Network Information
12+
| Network Information | Value |
13+
|---------------------|----------------|
14+
| Framework | TensorFlow Lite |
15+
| SHA-1 Hash | 13ca2294ba4bbb1f1c6c5e663cb532d58cd76a6b |
16+
| Size (Bytes) | 3997112 |
17+
| Provenance | https://github.com/ARM-software/ML-zoo/tree/master/models/speech_recognition/wav2letter |
18+
| Paper | https://arxiv.org/abs/1609.03193 |
19+
20+
## Performance
21+
22+
| Platform | Optimized |
23+
|----------|:---------:|
24+
| Cortex-A |:heavy_check_mark: |
25+
| Cortex-M |:heavy_check_mark: |
26+
| Mali GPU |:heavy_multiplication_x: |
27+
| Ethos U |:heavy_check_mark: |
28+
29+
### Key
30+
* :heavy_check_mark: - Will run on this platform.
31+
* :heavy_multiplication_x: - Will not run on this platform.
32+
33+
## Accuracy
34+
Dataset: Fluent Speech (trianed on LibriSpeech,Mini LibrySpeech,Fluent Speech)
35+
<br />
36+
Please note that Fluent Speech dataset hosted on Kaggle is a licensed dataset.
37+
38+
| Metric | Value |
39+
|--------|-------|
40+
| LER | 0.0348 |
41+
| WER | 0.112 |
42+
43+
## Optimizations
44+
| Optimization | Value |
45+
|--------------|---------|
46+
| Quantization | INT8 |
47+
48+
## Network Inputs
49+
<table>
50+
<tr>
51+
<th width="200">Input Node Name</th>
52+
<th width="100">Shape</th>
53+
<th width="300">Description</th>
54+
</tr>
55+
<tr>
56+
<td>input_1_int8</td>
57+
<td>(1, 296, 39)</td>
58+
<td>Speech converted to MFCCs and quantized to INT8</td>
59+
</tr>
60+
</table>
61+
62+
## Network Outputs
63+
<table>
64+
<tr>
65+
<th width="200">Output Node Name</th>
66+
<th width="100">Shape</th>
67+
<th width="300">Description</th>
68+
</tr>
69+
<tr>
70+
<td>Identity_int8</td>
71+
<td>(1, 1, 148, 29)</td>
72+
<td>A tensor of time and class probabilities, that represents the probability of each class at each timestep. Should be passed to a decoder. For example ctc_beam_search_decoder.</td>
73+
</tr>
74+
</table>
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
author_notes: null
2+
benchmark:
3+
benchmark_description: please note that fluent-speech-corpus dataset hosted on Kaggle
4+
is a licensed dataset.
5+
benchmark_link: https://www.kaggle.com/tommyngx/fluent-speech-corpus
6+
benchmark_metrics:
7+
LER: '0.0348'
8+
WER: '0.1123'
9+
benchmark_name: Fluent speech
10+
description: "Tiny Wav2letter is a tiny version of the original Wav2Letter model.\
11+
\ It is a convolutional speech recognition neural network. This implementation was\
12+
\ created by Arm, pruned to 50% sparsity, fine-tuned and quantized using the TensorFlow\
13+
\ Model Optimization Toolkit.\r\n\r\n"
14+
license:
15+
- Apache-2.0
16+
network:
17+
datatype: int8
18+
file_size_bytes: 3997112
19+
filename: tiny_wav2letter_int8.tflite
20+
framework: TensorFlow Lite
21+
framework_version: 2.4.1
22+
hash:
23+
algorithm: sha1
24+
value: 13ca2294ba4bbb1f1c6c5e663cb532d58cd76a6b
25+
provenance: https://github.com/ARM-software/ML-zoo/tree/master/models/speech_recognition/wav2letter
26+
training: LibriSpeech,Mini LibrySpeech,fluent speech
27+
network_parameters:
28+
input_nodes:
29+
- description: Speech converted to MFCCs and quantized to INT8
30+
example_input:
31+
path: models/speech_recognition/tiny_wav2letter/tflite_int8/testing_input/input_1_int8
32+
input_datatype: int8
33+
name: input_1_int8
34+
shape:
35+
- 1
36+
- 296
37+
- 39
38+
output_nodes:
39+
- description: A tensor of time and class probabilities, that represents the probability
40+
of each class at each timestep. Should be passed to a decoder. For example ctc_beam_search_decoder.
41+
example_output:
42+
path: models/speech_recognition/tiny_wav2letter/tflite_int8/testing_output/Identity_int8
43+
name: Identity_int8
44+
output_datatype: int8
45+
shape:
46+
- 1
47+
- 1
48+
- 148
49+
- 29
50+
network_quality:
51+
quality_level: Deployable
52+
quality_level_hero_hw: null
53+
operators:
54+
TensorFlow Lite:
55+
- CONV_2D
56+
- DEQUANTIZE
57+
- LEAKY_RELU
58+
- QUANTIZE
59+
- RESHAPE
60+
paper: https://arxiv.org/abs/1609.03193
Binary file not shown.

0 commit comments

Comments
 (0)