Skip to content

Commit 43f6bfc

Browse files
authored
Readme clarifications and improvements
1 parent d10a7fa commit 43f6bfc

File tree

1 file changed

+25
-15
lines changed

1 file changed

+25
-15
lines changed

README.md

Lines changed: 25 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Allows you to easily add voice recognition and synthesis to any web app with min
1010
This library is primarily intended for use in web browsers.
1111
Check out [watson-developer-cloud](https://www.npmjs.com/package/watson-developer-cloud) to use Watson services (speech and others) from Node.js.
1212

13-
However, a server-side component is required to generate auth tokens.
13+
However, a **server-side component is required to generate auth tokens**.
1414
The examples/ folder includes example Node.js and Python servers, and SDKs are available for [Node.js](https://github.com/watson-developer-cloud/node-sdk#authorization),
1515
[Java](https://github.com/watson-developer-cloud/java-sdk),
1616
[Python](https://github.com/watson-developer-cloud/python-sdk/blob/master/examples/authorization_v1.py),
@@ -32,18 +32,20 @@ Breaking change for v0.22.0
3232
----------------------------
3333

3434
The format of objects emitted in objectMode has changed from `{alternatives: [...], index: 1}` to `{results: [{alternatives: [...]}], result_index: 1}`.
35-
This was done to enable the new `speaker_labels` feature.
36-
There is a new `ResultExtractor` class and `recognizeMicrophone()` and `recognizeFile()` both accept a new `extract_results` option to restore the old behavior.
3735

38-
The format now exactly matches what the Watson Speech to Text service returns and shouldn't change again unless the Watson service changes.
36+
There is a new `ResultExtractor` class that restores the old behavior; `recognizeMicrophone()` and `recognizeFile()` both accept a new `extract_results` option to enable it.
37+
38+
This was done to enable the new `speaker_labels` feature. The format now exactly matches what the Watson Speech to Text service returns and shouldn't change again unless the Watson service changes.
3939

4040

4141
API & Examples
4242
--------------
4343

4444
The basic API is outlined below, see complete API docs at http://watson-developer-cloud.github.io/speech-javascript-sdk/master/
4545

46-
See several examples at https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/master/examples/static/
46+
See code for several basic examples at https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/master/examples/static/
47+
48+
See a live example at https://speech-to-text-demo.mybluemix.net/
4749

4850
All API methods require an auth token that must be [generated server-side](https://github.com/watson-developer-cloud/node-sdk#authorization).
4951
(See https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/master/examples/ for a couple of basic examples in Node.js and Python.)
@@ -57,46 +59,52 @@ Currently limited to text that can fit within a GET URL (this is particularly an
5759
where the max length is around 1000 characters after the token is accounted for.)
5860

5961
Options:
60-
* text - the text to transcribe // todo: list supported languages
62+
* text - the text to speak
6163
* voice - the desired playback voice's name - see .getVoices(). Note that the voices are language-specific.
6264
* autoPlay - set to false to prevent the audio from automatically playing
6365

6466

6567
## [`WatsonSpeech.SpeechToText`](http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_speech-to-text.html)
6668

69+
The `recognizeMicrophone()` and `recognizeFile()` helper methods are recommended for most use-cases. They set up the streams in the appropriate order and enable common options. These two methods are documented below.
70+
71+
The core of the library is the [RecognizeStream] that performs the actual transcription, and a collection of other Node.js-style streams that manipulate the data in various ways. For less common use-cases, the core components may be used directly with the helper methods serving as optional templates to follow. The full library is documented at http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_speech-to-text.html
6772

6873
### [`.recognizeMicrophone({token})`](http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_speech-to-text_recognize-microphone.html) -> Stream
6974

7075
Options:
7176
* `keepMic`: if true, preserves the MicrophoneStream for subsequent calls, preventing additional permissions requests in Firefox
7277
* Other options passed to [RecognizeStream]
78+
* Other options passed to [SpeakerStream] if `options.resultsbySpeaker` is set to true
79+
* Other options passed to [FormatStream] if `options.format` is not set to false
7380
* Other options passed to [WritableElementStream] if `options.outputElement` is set
7481

7582
Requires the `getUserMedia` API, so limited browser compatibility (see http://caniuse.com/#search=getusermedia)
7683
Also note that Chrome requires https (with a few exceptions for localhost and such) - see https://www.chromium.org/Home/chromium-security/prefer-secure-origins-for-powerful-new-features
7784

78-
Pipes results through a [FormatStream] by default, set `options.format=false` to disable.
85+
No more data will be set after `.stop()` is called on the returned stream, but additional results may be recieved for already-sent data.
7986

8087

8188
### [`.recognizeFile({data, token})`](http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_speech-to-text_recognize-file.html) -> Stream
8289

83-
Can recognize and optionally attempt to play a [File](https://developer.mozilla.org/en-US/docs/Web/API/File) or [Blob](https://developer.mozilla.org/en-US/docs/Web/API/Blob)
90+
Can recognize and optionally attempt to play a URL, [File](https://developer.mozilla.org/en-US/docs/Web/API/File) or [Blob](https://developer.mozilla.org/en-US/docs/Web/API/Blob)
8491
(such as from an `<input type="file"/>` or from an ajax request.)
8592

8693
Options:
87-
* `file`: a String URL or a `Blob` or `File` instance.
94+
* `file`: a String URL or a `Blob` or `File` instance. Note that [CORS] restrictions apply to URLs.
8895
* `play`: (optional, default=`false`) Attempt to also play the file locally while uploading it for transcription
8996
* Other options passed to [RecognizeStream]
97+
* Other options passed to [TimingStream] if `options.realtime` is true, or unset and `options.play` is true
98+
* Other options passed to [SpeakerStream] if `options.resultsbySpeaker` is set to true
99+
* Other options passed to [FormatStream] if `options.format` is not set to false
90100
* Other options passed to [WritableElementStream] if `options.outputElement` is set
91101

92102
`play`requires that the browser support the format; most browsers support wav and ogg/opus, but not flac.)
93-
Will emit an `UNSUPPORTED_FORMAT` error on the RecognizeStream if playback fails.
94-
Playback will automatically stop when `.stop()` is called on the returned stream.
95-
For Mobile Safari compatibility, a URL must be provided, and `recognizeFile()` must be called in direct response to a user interaction (so the token must be pre-loaded).
103+
Will emit an `UNSUPPORTED_FORMAT` error on the RecognizeStream if playback fails. This error is special in that it does not stop the streaming of results.
96104

97-
Pipes results through a [TimingStream] by if `options.play=true`, set `options.realtime=false` to disable.
105+
Playback will automatically stop when `.stop()` is called on the returned stream.
98106

99-
Pipes results through a [FormatStream] by default, set `options.format=false` to disable.
107+
For Mobile Safari compatibility, a URL must be provided, and `recognizeFile()` must be called in direct response to a user interaction (so the token must be pre-loaded).
100108

101109

102110
## Changes
@@ -107,7 +115,7 @@ There have been a few breaking changes in recent releases:
107115
* renamed `recognizeBlob` to `recognizeFile` to make the primary usage more apparent
108116
* Changed `playFile` option of `recognizeBlob()` to just `play`, corrected default
109117
* Changed format of objects emitted in objectMode to exactly match what service sends. Added `ResultStrean` class and `extract_results` option to enable older behavior.
110-
* Changed `playback-error` event to just `error` when recognizing and playing a file. Check for `error.name == 'UNSUPPORTED_FORMAT'` to identify playback errors
118+
* Changed `playback-error` event to just `error` when recognizing and playing a file. Check for `error.name == 'UNSUPPORTED_FORMAT'` to identify playback errors. This error is special in that it does not stop the streaming of results.
111119
* Renamed `recognizeFile()`'s `data` option to `file` because it now may be a URL. Using a URL enables faster playback and mobile Safari support
112120

113121
See [CHANGELOG.md](CHANGELOG.md) for a complete list of changes.
@@ -131,3 +139,5 @@ See [CHANGELOG.md](CHANGELOG.md) for a complete list of changes.
131139
[TimingStream]: http://watson-developer-cloud.github.io/speech-javascript-sdk/master/TimingStream.html
132140
[FormatStream]: http://watson-developer-cloud.github.io/speech-javascript-sdk/master/FormatStream.html
133141
[WritableElementStream]: http://watson-developer-cloud.github.io/speech-javascript-sdk/master/WritableElementStream.html
142+
[SpeakerStream]: http://watson-developer-cloud.github.io/speech-javascript-sdk/master/SpeakerStream.html
143+
[CORS]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS

0 commit comments

Comments
 (0)