Skip to content

Audio transcription

cesine edited this page May 24, 2011 · 23 revisions

Table of Contents

Google Speech Recognition

Google speech recognition stands to be the best quality of all "available" systems. Their Search Langauge Model is based on the billions of google searches. Their free-form Language models are based on transcriptions of Google Voice voicemail messages, YouTube videos (it generates closed captions, and then users can upload corrected versions so that users can have accurate closed captions) among other unconfirmed datasources.

Open Source Clients

Closed Source Clients

  • Relevant packages that could be tweeked to provide a GUI free solution
    • com.google.android.voicesearch
    • com.google.android.voicesearch.speechservice

Sphinx

A classic and long-standing project now hosting Google Summer of Code students. CMUSphinx is a speaker-independent large vocabulary continuous speech recognizer released under BSD style license. It is also a collection of open source tools and resources that allows researchers and developers to build speech recognition systems.

Language independent phonetic transcription

  • Our goal is to support a bit of bootstrapping, even for non-standard languages so that experiments on any language provide at least a bit of audio analysis.

Related Requirements

Audio Chunking based on Silence

  • The MARF project has some libraries for audio analysis. Not sure how complete and which goals have been realized yet.
 MARF is an open-source research platform and a collection of voice/sound/speech/text and natural language processing (NLP) algorithms written in Java and arranged into a modular and extensible framework facilitating addition of new algorithms.

Additional References

Clone this wiki locally