Audio transcription

Table of Contents Google Speech Recognition Sphinx Language independent phonetic transcription Related Requirements Audio Chunking based on Silence Additional References

Google Speech Recognition

Google speech recognition stands to be the best quality of all "available" systems. Their Search Langauge Model is based on the billions of google searches. Their free-form Language models are based on transcriptions of Google Voice voicemail messages, YouTube videos (it generates closed captions, and then users can upload corrected versions so that users can have accurate closed captions) among other unconfirmed datasources.

Cromium hack by Mike Putz results in a general perl+post approach, others made it work for PHP and Java
- Cromium speech code
Android RecognizerIntent Sample VoiceRecognition.java The example works great and is very clear, you can test it out in the API demos Sample code in the SDK. But, it's only for short speech samples (until user pauses) and it is a Intent->GUI->Record->Result use case. No GUI-free/eyes-free access yet. There are feature requests for it on the Android Google Code issue tracker.
Android Source on GitHub, contains the core code, not the com.google code
Relevant packages that could be tweeked to provide a GUI free solution
- com.google.android.voicesearch
- com.google.android.voicesearch.speechservice
Android SpeechRecognizer does allow more control than just the intent but still no possibility to pre-process and chunk audio to send a longer file/sample.
The Voice Recognizer Sample in the Android SDK (just create a new project in Eclipse, select create from existing source) shows a skeleton example of how to build a new Speech Recognizer that is automatically registered in the Device, and can be configured even in the Android Preferences > Voice Input and Output, this could be a direction to follow if one were to implement a new Speech Recognition service, using another server such as a machine running Sphinx or using the Google semi-exposed service discussed in Mike Pultz' Cromium investigation. Beware, it causes android.process.acore fail unexpectedly force closes, probably due to the Android core being heavily tied into the (exact) implementation of the RecognizerIntent/SpeechRecognizer in previous development, this will likely change in the future.

Sphinx

A classic and long-standing project now hosting Google Summer of Code students. CMUSphinx is a speaker-independent large vocabulary continuous speech recognizer released under BSD style license. It is also a collection of open source tools and resources that allows researchers and developers to build speech recognition systems.

http://cmusphinx.sourceforge.net/

Language independent phonetic transcription

Our goal is to support a bit of bootstrapping, even for non-standard languages so that experiments on any language provide at least a bit of audio analysis.

Related Requirements

Audio Chunking based on Silence

The MARF project has some libraries for audio analysis. Not sure how complete and which goals have been realized yet.

 MARF is an open-source research platform and a collection of voice/sound/speech/text and natural language processing (NLP) algorithms written in Java and arranged into a modular and extensible framework facilitating addition of new algorithms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Audio transcription

Table of Contents

Google Speech Recognition

Sphinx

Language independent phonetic transcription

Related Requirements

Audio Chunking based on Silence

Additional References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally