Skip to content

Commit 7a11845

Browse files
authored
Update index.html
1 parent 8390e4a commit 7a11845

File tree

1 file changed

+4
-6
lines changed

1 file changed

+4
-6
lines changed

spoken_numerals/index.html

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -172,22 +172,20 @@ <h3 style="font-weight:600; font-family: sans-serif;"> About Dataset <div style=
172172
<font size="4">Speech recognition has improved dramatically over the past years due to advances in machine learning and the availability of speech data. Speech recognition is nowadays powering a multitude of applications, from home virtual assistants to call centers, and it is expected to be integrated in many more systems, some of which might be critical for inclusivity.
173173
</font>
174174

175-
176-
175+
<br />
177176
<br />
178177

179178
<font size="4">
180-
Machine learning solutions are however constrained by the quality of the data they are trained on. If our data does not represent our target population well, we can only aspire for our solution to work well on the sub-population that our data represents. In other words, solutions from non-representative data are inevitably biased towards a sub-population. In the context of speech recognition, machine learning solutions trained on non-representative datasets will not perform well on any sub-population that is not represented well, which can have a detrimental impact on inclusivity.
179+
Machine learning solutions are however constrained by the quality of the data they are trained on. If our data does not represent our target population well, we can only aspire for our solution to work well on the sub-population that our data represents. In other words, solutions from non-representative data are inevitably biased towards a sub-population. In the context of speech recognition, machine learning solutions trained on non-representative datasets will not perform well on any sub-population that is not represented well, and this can have a detrimental impact on inclusivity.
181180
</font>
182181

183182
<br />
184183
<br />
185184

186185
<font size="4">
187-
The MLEnd Spoken Numerals dataset is a collection of more than <b>32k audio recordings</b> produced by <b>154 speakers</b>. Each audio recording corresponds to one <b>English numeral (from "zero" to "billion")</b> that is read using different intonations <b>("neutral", "bored", "excited" and "question")</b>. Our participants have a diverse background: <b>31 nationalities</b> and <b>42 unique languages</b> are represented in the MLEnd Spoken Numerals dataset. This dataset comes with additional demographic information about our participants.
186+
The MLEnd Spoken Numerals dataset is a collection of more than <b>32k audio recordings</b> produced by <b>154 speakers</b>. Each audio recording corresponds to one <b>English numeral</b> (from "zero" to "billion") that is read using different <b>intonations</b> ("neutral", "bored", "excited" and "question"). Our participants have a diverse background: <b>31 nationalities</b> and <b>42 mother languages</b> are represented in the MLEnd Spoken Numerals dataset. This dataset comes with additional demographic information about our participants.
188187
</font><font size="4">
189-
The MLEnd datasets have been created by students at the School of Electronic Engineering and Computer Science, Queen Mary University of London. Other datasets include the MLEnd Hums and Whistles dataset, also available on Kaggle. Do not hesitate to reach out if you want to know more about how we did it.
190-
188+
The MLEnd datasets have been created by students at the School of Electronic Engineering and Computer Science, Queen Mary University of London.
191189

192190
</font><font size="4">
193191
<br />

0 commit comments

Comments
 (0)