Skip to content

A repository to collect Speech Recognition resources and evaluate them

Notifications You must be signed in to change notification settings

corvusMidnight/DutchSpeechDetectionResources

Repository files navigation

image

Dutch Speech Detection Resources

This repository functions as a collection of resources for the development and testing of speech detection/recognition/analysis tools for the Dutch/Flemish language. These include:

Table of contents

Models

Title Type Description Size Link
Wav2Vec2 Wav2Vec2 Wav2Vec2 Wav2Vec2 Wav2Vec2
XLSR Wav2Vec2 Dutch by Jonatas Grosman "Fine-tuned facebook/wav2vec2-large-xlsr-53 on Dutch using the train and validation splits of Common Voice 6.1 and CSS10." - HuggingFace
Dutch XLSR Wav2Vec2 Large 53 by Wietse de Vries "Fine-tuned facebook/wav2vec2-large-xlsr-53 on Dutch using the Common Voice dataset. When using this model, make sure that your speech input is sampled at 16kHz." - HuggingFace
wav2vec2-large-xls-r-300m-nl "This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the common_voice dataset." - HuggingFace
Wav2Vec2-Large-XLSR-53-Dutch "Fine-tuned facebook/wav2vec2-large-xlsr-53 on Dutch using the Common Voice. When using this model, make sure that your speech input is sampled at 16kHz." - HuggingFace
wav2vec2-large-xlsr-53-Dutch by Mehdi Hosseini Moghadam "Fine-tuned facebook/wav2vec2-large-xlsr-53 in Dutch using the Common Voice. When using this model, make sure that your speech input is sampled at 16kHz." - HuggingFace
simonsr wav2vec2-large-xlsr-dutch "Fine-tuned facebook/wav2vec2-large-xlsr-53 on Dutch using the Common Voice. When using this model, make sure that your speech input is sampled at 16kHz." - HuggingFace
facebook wav2vec2 large xlsr-53-dutch model "The model facebook wav2vec2 large xlsr-53-dutch is a Natural Language Processing (NLP) Model implemented in Transformer library, generally using the Python programming language." - HuggingFace
GroNLP/wav2vec2-dutch-large-ft-cgn "A Dutch Wav2Vec2 model. This model is created by further pre-training the original English facebook/wav2vec2-large model on Dutch speech from Het Corpus Gesproken Nederlands. Subsequently, the model is fine-tuned on the same Dutch speech using CTC." - HuggingFce
Wav2Vec2-Large-XLSR-53-ft-CGN "This model is created by fine-tuning the facebook/wav2vec2-large-xlsr-53 model on Dutch speech from Het Corpus Gesproken Nederlands using CTC." - HuggingFace
openai openai openai openai openai
openai/whisper-large "The Whisper model was proposed in Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever." - HuggingFace
s2t s2t s2t s2t s2t
facebook/s2t-medium-mustc-multilingual-st "s2t-medium-mustc-multilingual-st is a Speech to Text Transformer (S2T) model trained for end-to-end Multilingual Speech Translation (ST)." - HuggingFace
speechbrain speechbrain speechbrain speechbrain speechbrain
speechbrain/lang-id-commonlanguage_ecapa Copied "This repository provides all the necessary tools to perform language identification from speech recordings with SpeechBrain. The system uses a model pretrained on the CommonLanguage dataset (45 languages)." - HuggingFace
Coming soon...

Data

Title Type Description Size Link
Common Voice NL "The Common Voice dataset consists of a unique MP3 and corresponding text file. Many of the 9,283 recorded hours in the dataset also include demographic metadata like age, sex, and accent that can help train the accuracy of speech recognition engines. The dataset currently consists of 7,335 validated hours in 60 languages" 63 hours HuggingFace
LibriVox "Free public domain audiobooks" 295 books LibriVox
MUST-C "Created by Di Gangi et al. at 2019, the MuST-C Dataset is a speech translation corpus containing 385 hours from Ted talks for speech translation from English into several languages: Dutch, French, German, Italian, Portuguese, Romanian, Russian, & Spanish. Requires filling request form., in Multi-Lingual language." 385 hours Fondazione Bruno Kessler
dutch-vl-tts "This dataset contains 15.000 audio fragments of a male Dutch Flemish voice, the sentences read are extracted from the Mozilla Common Voice project." 15.000 audio recordings GitHub
Corpus Gesproken Nederlands "In de periode 1998-2004 is in het kader van het project Corpus Gesproken Nederlandse (CGN) gewerkt aan de aanleg van een databank voor het hedendaags Nederlands zoals dat door volwassen sprekers in Nederland en Vlaanderen wordt gesproken. De resultaten van dit project zijn in maart 2004 beschikbaar gekomen." - CGN
IFA Spoken Language Corpus "The IFA Spoken Language corpus is a free (GPL) database of hand-segmented Dutch speech. It was constructed with off-the-shelf software using speech from 8 speakers in a variety of speaking styles. For a total of 50,000 words (41 minutes/speaker), speech acquisition and preparation took around 3 person-weeks per speaker." 4 hours IFA
CSS10 "CSS10 is a collection of single speaker speech datasets for 10 languages. Each of them consists of audio files recorded by a single volunteer and their aligned text sourced from LibriVox." - kaggle
Spoken Wikipedia Corpus (Dutch) "The Spoken Wikipedia project unites volunteer readers of Wikipedia articles. Hundreds of spoken articles in multiple languages are available to users who are – for one reason or another – unable or unwilling to consume the written version of the article." - kaggle
Corpus Gesproken Nederlands (CGN) "Het Corpus Gesproken Nederlands (CGN) is een verzameling van 900 uur (bijna 9 miljoen woorden) hedendaagse Nederlandse spraak, afkomstig van Vlamingen en Nederlanders." 900 hours Instituut voor de Nederlandse taal
Coming soon...

Evaluation

  • by Jonatas Grosman on (100 samples)

evaluation

  • by OpenAI on (100 samples)

evaluation

About

A repository to collect Speech Recognition resources and evaluate them

Resources

Stars

Watchers

Forks

Releases

No releases published