Skip to content

Commit 2ec749b

Browse files
committed
removing SpeechToText.recognizeElemet method
There were a number of issues with this method: * Audio was transcoded and resampled twice, leading to lower transcription quality * The audio output does not seem to be 100% the same through consecutive replays of a given file, leading to slight variations in the transcription * If the source was pause/stopped/reqound/fastforwarded, this would affect the transcription The last issue could have been handled with significant effort, but the former could not.
1 parent a635a3c commit 2ec749b

File tree

8 files changed

+3
-415
lines changed

8 files changed

+3
-415
lines changed

README.md

Lines changed: 3 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -64,32 +64,7 @@ Also note that Chrome requires https (with a few exceptions for localhost and su
6464

6565
Pipes results through a `{FormatStream}` by default, set `options.format=false` to disable.
6666

67-
Known issue: Firefox continues to display a microphone icon in the address bar even after recording has ceased. This is a browser bug.
68-
69-
### `.recognizeElement({element, token})` -> `RecognizeStream`
70-
71-
Extract audio from an `<audio>` or `<video>` element and transcribe speech.
72-
73-
This method has some limitations:
74-
* the audio is run through two lossy conversions: first from the source format to WebAudio, and second to l16 (raw wav) for Watson
75-
* the WebAudio API does not guarantee the same exact output for the same file played twice, so it's possible to receive slight different transcriptions for the same file played repeatedly
76-
* it transcribes the audio as it is heard, so pausing or skipping will affect the transcription
77-
* audio that is paused for too long will cause the socket to time out and disconnect, preventing further transcription (without setting things up again)
78-
79-
Because of these limitations, there are two alternative methods that may be preferable in some situations:
80-
* fetch the audio via ajax and then pass it to `recognizeFile()` - this resolves the issues around lossy conversion and inexact transcription, but the audio playback, if enabled, cannot be paused, rewound, etc.
81-
* Pre-process the audio and generate a [WebVTT](https://developer.mozilla.org/en-US/docs/Web/API/Web_Video_Text_Tracks_Format) subtitles file to insert in `<track>`, completely bypassing this SDK. This resolves all of the above issues, and gives you an opportunity to review and/or edit the subtitles if desired.
82-
83-
Options:
84-
* `element`: an `<audio>` or `<video>` element (could be generated pragmatically, e.g. `new Audio()`)
85-
* Other options passed to MediaElementAudioStream and RecognizeStream
86-
* Other options passed to WritableElementStream if `options.outputElement` is set
87-
88-
Requires that the browser support MediaElement and whatever audio codec is used in your media file.
89-
90-
Will automatically call `.play()` the `element`, set `options.autoPlay=false` to disable. Calling `.stop()` on the returned stream will automatically call `.stop()` on the `element`.
91-
92-
Pipes results through a `{FormatStream}` by default, set `options.format=false` to disable.
67+
Known issue: Firefox continues to display a microphone icon in the address bar after recording has ceased. This is a browser bug.
9368

9469
### `.recognizeFile({data, token})` -> `RecognizeStream`
9570

@@ -161,7 +136,8 @@ Accepts input from `RecognizeStream()` and friends, writes text to supplied `out
161136

162137
## Changelog
163138

164-
### v next
139+
### v0.15
140+
* Removed `SpeechToText.recognizeElement()` due to quality issues
165141
* Added `options.element` to TextToSpeech.synthesize() to support playing through exiting elements
166142

167143
### v0.14

examples/static/audio-element-programmatic.html

Lines changed: 0 additions & 51 deletions
This file was deleted.

examples/static/audio-element.html

Lines changed: 0 additions & 50 deletions
This file was deleted.

examples/static/index.html

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,6 @@ <h2>Speech to Text</h2>
1616
<li><a href="file-realtime-vs-no-realtime.html">Transcribe from file, Comparing <code>{realtime: true}</code> to <code>{realtime: false}</code></a></li>
1717
<li><a href="file-promise.html">Transcribe from file, Promise</a></li>
1818
<li><a href="file-ajax.html">Transcribe from file loaded over AJAX</a></li>
19-
<li><a href="audio-element.html">Transcribe from HTML5 &lt;audio&gt; element, Streaming</a></li>
20-
<li><a href="audio-element-programmatic.html">Transcribe from <code>new Audio()</code>, Streaming</a></li>
2119
</ul>
2220

2321
<h2>Text to Speech</h2>

speech-to-text/index.js

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -26,23 +26,13 @@ module.exports = {
2626
*/
2727
recognizeFile: require('./recognize-file'),
2828

29-
/**
30-
* @see module:watson-speech/speech-to-text/recognize-element
31-
*/
32-
recognizeElement: require('./recognize-element'),
33-
3429

3530
// individual components to build more customized solutions
3631
/**
3732
* @see WebAudioL16Stream
3833
*/
3934
WebAudioL16Stream: require('./webaudio-l16-stream'),
4035

41-
/**
42-
* @see MediaElementAudioStream
43-
*/
44-
MediaElementAudioStream: require('./media-element-audio-stream'),
45-
4636
/**
4737
* @see RecognizeStream
4838
*/

speech-to-text/media-element-audio-stream.js

Lines changed: 0 additions & 170 deletions
This file was deleted.

0 commit comments

Comments
 (0)