Skip to content

Commit 892304c

Browse files
committed
docs: organize documentation
1 parent 1ad98a0 commit 892304c

File tree

7 files changed

+110
-117
lines changed

7 files changed

+110
-117
lines changed

README.md

Lines changed: 9 additions & 117 deletions
Original file line numberDiff line numberDiff line change
@@ -36,107 +36,21 @@ var recognizeMic = require('watson-speech/speech-to-text/recognize-microphone');
3636
```
3737

3838

39-
Breaking change for v0.22.0
40-
----------------------------
41-
42-
The format of objects emitted in objectMode has changed from `{alternatives: [...], index: 1}` to `{results: [{alternatives: [...]}], result_index: 1}`.
43-
44-
There is a new `ResultExtractor` class that restores the old behavior; `recognizeMicrophone()` and `recognizeFile()` both accept a new `extract_results` option to enable it.
45-
46-
This was done to enable the new `speaker_labels` feature. The format now exactly matches what the Watson Speech to Text service returns and shouldn't change again unless the Watson service changes.
47-
48-
49-
API & Examples
50-
--------------
51-
52-
The basic API is outlined below, see complete API docs at http://watson-developer-cloud.github.io/speech-javascript-sdk/master/
53-
54-
See several basic examples at http://watson-speech.mybluemix.net/ ([source](https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/master/examples/))
55-
56-
See a more advanced example at https://speech-to-text-demo.mybluemix.net/
57-
58-
All API methods require an auth token that must be [generated server-side](https://github.com/watson-developer-cloud/node-sdk#authorization).
59-
(See https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/master/examples/ for a couple of basic examples in Node.js and Python.)
60-
61-
_NOTE_: The `token` parameter only works for CF instances of services. For RC services using IAM for authentication, the `access_token` parameter must be used.
62-
63-
## [`WatsonSpeech.TextToSpeech`](http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_text-to-speech.html)
64-
65-
### [`.synthesize({text, token||access_token})`](http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_text-to-speech_synthesize.html) -> `<audio>`
66-
67-
Speaks the supplied text through an automatically-created `<audio>` element.
68-
Currently limited to text that can fit within a GET URL (this is particularly an issue on [Internet Explorer before Windows 10](http://stackoverflow.com/questions/32267442/url-length-limitation-of-microsoft-edge)
69-
where the max length is around 1000 characters after the token is accounted for.)
70-
71-
Options:
72-
* text - the text to speak
73-
* url - the Watson Text to Speech API URL (defaults to https://stream.watsonplatform.net/text-to-speech/api)
74-
* voice - the desired playback voice's name - see .getVoices(). Note that the voices are language-specific.
75-
* customization_id - GUID of a custom voice model - omit to use the voice with no customization.
76-
* autoPlay - set to false to prevent the audio from automatically playing
77-
78-
Relies on browser audio support: should work reliably in Chrome and Firefox on desktop and Android. Edge works with a little help. Safari and all iOS browsers do not seem to work yet.
79-
80-
## [`WatsonSpeech.SpeechToText`](http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_speech-to-text.html)
81-
82-
The `recognizeMicrophone()` and `recognizeFile()` helper methods are recommended for most use-cases. They set up the streams in the appropriate order and enable common options. These two methods are documented below.
83-
84-
The core of the library is the [RecognizeStream] that performs the actual transcription, and a collection of other Node.js-style streams that manipulate the data in various ways. For less common use-cases, the core components may be used directly with the helper methods serving as optional templates to follow. The full library is documented at http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_speech-to-text.html
85-
86-
### [`.recognizeMicrophone({token||access_token})`](http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_speech-to-text_recognize-microphone.html) -> Stream
87-
88-
Options:
89-
* `keepMicrophone`: if true, preserves the MicrophoneStream for subsequent calls, preventing additional permissions requests in Firefox
90-
* `mediaStream`: Optionally pass in an existing media stream rather than prompting the user for microphone access.
91-
* Other options passed to [RecognizeStream]
92-
* Other options passed to [SpeakerStream] if `options.resultsbySpeaker` is set to true
93-
* Other options passed to [FormatStream] if `options.format` is not set to false
94-
* Other options passed to [WritableElementStream] if `options.outputElement` is set
95-
96-
Requires the `getUserMedia` API, so limited browser compatibility (see http://caniuse.com/#search=getusermedia)
97-
Also note that Chrome requires https (with a few exceptions for localhost and such) - see https://www.chromium.org/Home/chromium-security/prefer-secure-origins-for-powerful-new-features
98-
99-
No more data will be set after `.stop()` is called on the returned stream, but additional results may be recieved for already-sent data.
100-
101-
102-
### [`.recognizeFile({data, token||access_token})`](http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_speech-to-text_recognize-file.html) -> Stream
103-
104-
Can recognize and optionally attempt to play a URL, [File](https://developer.mozilla.org/en-US/docs/Web/API/File) or [Blob](https://developer.mozilla.org/en-US/docs/Web/API/Blob)
105-
(such as from an `<input type="file"/>` or from an ajax request.)
106-
107-
Options:
108-
* `file`: a String URL or a `Blob` or `File` instance. Note that [CORS] restrictions apply to URLs.
109-
* `play`: (optional, default=`false`) Attempt to also play the file locally while uploading it for transcription
110-
* Other options passed to [RecognizeStream]
111-
* Other options passed to [TimingStream] if `options.realtime` is true, or unset and `options.play` is true
112-
* Other options passed to [SpeakerStream] if `options.resultsbySpeaker` is set to true
113-
* Other options passed to [FormatStream] if `options.format` is not set to false
114-
* Other options passed to [WritableElementStream] if `options.outputElement` is set
115-
116-
`play`requires that the browser support the format; most browsers support wav and ogg/opus, but not flac.)
117-
Will emit an `UNSUPPORTED_FORMAT` error on the RecognizeStream if playback fails. This error is special in that it does not stop the streaming of results.
118-
119-
Playback will automatically stop when `.stop()` is called on the returned stream.
120-
121-
For Mobile Safari compatibility, a URL must be provided, and `recognizeFile()` must be called in direct response to a user interaction (so the token must be pre-loaded).
122-
123-
12439
## Changes
12540

126-
There have been a few breaking changes in recent releases:
127-
128-
* Removed `SpeechToText.recognizeElement()` due to quality issues. The code is [avaliable in an (unsupported) example](https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/master/examples/static/audio-video-deprecated) if you wish to use it with current releases of the SDK.
129-
* renamed `recognizeBlob` to `recognizeFile` to make the primary usage more apparent
130-
* Changed `playFile` option of `recognizeBlob()` to just `play`, corrected default
131-
* Changed format of objects emitted in objectMode to exactly match what service sends. Added `ResultStream` class and `extract_results` option to enable older behavior.
132-
* Changed `playback-error` event to just `error` when recognizing and playing a file. Check for `error.name == 'UNSUPPORTED_FORMAT'` to identify playback errors. This error is special in that it does not stop the streaming of results.
133-
* Renamed `recognizeFile()`'s `data` option to `file` because it now may be a URL. Using a URL enables faster playback and mobile Safari support
134-
* Continous flag for OPENING_MESSAGE_PARAMS_ALLOWED has been removed
135-
13641
See [CHANGELOG.md](CHANGELOG.md) for a complete list of changes.
13742

13843
## Development
13944

45+
### Use examples for development
46+
The provided examples can be used to test developmental code in action:
47+
* `cd examples/`
48+
* `npm run dev`
49+
50+
This will build the local code, move the new bundle into the `examples/` directory, and start a new server at `localhost:3000` where the examples will be running.
51+
52+
Note: This requires valid service credentials.
53+
14054
### Testing
14155
The test suite is broken up into offline unit tests and integration tests that test against actual service instances.
14256
* `npm test` will run the linter and the offline tests
@@ -146,25 +60,3 @@ The test suite is broken up into offline unit tests and integration tests that t
14660
To run the integration tests, a file with service credentials is required. This file must be called `stt-auth.json` and must be located in `/test/resources/`. There are tests for usage of both CF and RC service instances. For testing CF, the required keys in this configuration file are `username` and `password`. For testing RC, a key of either `iam_acess_token` or `iam_apikey` is required. Optionally, a service URL for an RC instance can be provided under the key `rc_service_url` if the service is available under a URL other than `https://stream.watsonplatform.net/speech-to-text/api`.
14761

14862
For an example, see `test/resources/stt-auth-example.json`.
149-
150-
## todo
151-
152-
* Further solidify API
153-
* break components into standalone npm modules where it makes sense
154-
* run integration tests on travis (fall back to offline server for pull requests)
155-
* add even more tests
156-
* better cross-browser testing (IE, Safari, mobile browsers - maybe saucelabs?)
157-
* update node-sdk to use current version of this lib's RecognizeStream (and also provide the FormatStream + anything else that might be handy)
158-
* move `result` and `results` events to node wrapper (along with the deprecation notice)
159-
* improve docs
160-
* consider a wrapper to match https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
161-
* support a "hard" stop that prevents any further data events, even for already uploaded audio, ensure timing stream also implements this.
162-
* look for bug where single-word final results may omit word confidence (possibly due to FormatStream?)
163-
* fix bug where TimingStream shows words slightly before they're spoken
164-
165-
[RecognizeStream]: http://watson-developer-cloud.github.io/speech-javascript-sdk/master/RecognizeStream.html
166-
[TimingStream]: http://watson-developer-cloud.github.io/speech-javascript-sdk/master/TimingStream.html
167-
[FormatStream]: http://watson-developer-cloud.github.io/speech-javascript-sdk/master/FormatStream.html
168-
[WritableElementStream]: http://watson-developer-cloud.github.io/speech-javascript-sdk/master/WritableElementStream.html
169-
[SpeakerStream]: http://watson-developer-cloud.github.io/speech-javascript-sdk/master/SpeakerStream.html
170-
[CORS]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS

docs/README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
API & Examples
2+
--------------
3+
4+
The basic API is outlined below, see complete API docs at http://watson-developer-cloud.github.io/speech-javascript-sdk/master/
5+
6+
See several basic examples at http://watson-speech.mybluemix.net/ ([source](https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/master/examples/))
7+
8+
See a more advanced example at https://speech-to-text-demo.mybluemix.net/
9+
10+
All API methods require an auth token that must be [generated server-side](https://github.com/watson-developer-cloud/node-sdk#authorization).
11+
(See https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/master/examples/ for a couple of basic examples in Node.js and Python.)
12+
13+
_NOTE_: The `token` parameter only works for CF instances of services. For RC services using IAM for authentication, the `access_token` parameter must be used.

docs/SPEECH-TO-TEXT.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Speech to Text
2+
3+
## [`WatsonSpeech.SpeechToText`](http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_speech-to-text.html)
4+
5+
The `recognizeMicrophone()` and `recognizeFile()` helper methods are recommended for most use-cases. They set up the streams in the appropriate order and enable common options. These two methods are documented below.
6+
7+
The core of the library is the [RecognizeStream] that performs the actual transcription, and a collection of other Node.js-style streams that manipulate the data in various ways. For less common use-cases, the core components may be used directly with the helper methods serving as optional templates to follow. The full library is documented at http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_speech-to-text.html
8+
9+
_NOTE_ The RecognizeStream class lives in the Watson Node SDK. Any option available on this class can be passed into the following methods. These parameters are documented at http://watson-developer-cloud.github.io/node-sdk/master/classes/recognizestream.html
10+
11+
### [`.recognizeMicrophone({token||access_token})`](http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_speech-to-text_recognize-microphone.html) -> Stream
12+
13+
Options:
14+
* `keepMicrophone`: if true, preserves the MicrophoneStream for subsequent calls, preventing additional permissions requests in Firefox
15+
* `mediaStream`: Optionally pass in an existing media stream rather than prompting the user for microphone access.
16+
* Other options passed to [RecognizeStream]
17+
* Other options passed to [SpeakerStream] if `options.resultsbySpeaker` is set to true
18+
* Other options passed to [FormatStream] if `options.format` is not set to false
19+
* Other options passed to [WritableElementStream] if `options.outputElement` is set
20+
21+
Requires the `getUserMedia` API, so limited browser compatibility (see http://caniuse.com/#search=getusermedia)
22+
Also note that Chrome requires https (with a few exceptions for localhost and such) - see https://www.chromium.org/Home/chromium-security/prefer-secure-origins-for-powerful-new-features
23+
24+
No more data will be set after `.stop()` is called on the returned stream, but additional results may be recieved for already-sent data.
25+
26+
27+
### [`.recognizeFile({data, token||access_token})`](http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_speech-to-text_recognize-file.html) -> Stream
28+
29+
Can recognize and optionally attempt to play a URL, [File](https://developer.mozilla.org/en-US/docs/Web/API/File) or [Blob](https://developer.mozilla.org/en-US/docs/Web/API/Blob)
30+
(such as from an `<input type="file"/>` or from an ajax request.)
31+
32+
Options:
33+
* `file`: a String URL or a `Blob` or `File` instance. Note that [CORS] restrictions apply to URLs.
34+
* `play`: (optional, default=`false`) Attempt to also play the file locally while uploading it for transcription
35+
* Other options passed to [RecognizeStream]
36+
* Other options passed to [TimingStream] if `options.realtime` is true, or unset and `options.play` is true
37+
* Other options passed to [SpeakerStream] if `options.resultsbySpeaker` is set to true
38+
* Other options passed to [FormatStream] if `options.format` is not set to false
39+
* Other options passed to [WritableElementStream] if `options.outputElement` is set
40+
41+
`play` requires that the browser support the format; most browsers support wav and ogg/opus, but not flac.)
42+
Will emit an `UNSUPPORTED_FORMAT` error on the RecognizeStream if playback fails. This error is special in that it does not stop the streaming of results.
43+
44+
Playback will automatically stop when `.stop()` is called on the returned stream.
45+
46+
For Mobile Safari compatibility, a URL must be provided, and `recognizeFile()` must be called in direct response to a user interaction (so the token must be pre-loaded).
47+
48+
[RecognizeStream]: http://watson-developer-cloud.github.io/node-sdk/master/classes/recognizestream.html
49+
[TimingStream]: http://watson-developer-cloud.github.io/speech-javascript-sdk/master/TimingStream.html
50+
[FormatStream]: http://watson-developer-cloud.github.io/speech-javascript-sdk/master/FormatStream.html
51+
[WritableElementStream]: http://watson-developer-cloud.github.io/speech-javascript-sdk/master/WritableElementStream.html
52+
[SpeakerStream]: http://watson-developer-cloud.github.io/speech-javascript-sdk/master/SpeakerStream.html
53+
[CORS]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS
54+

docs/TEXT-TO-SPEECH.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Text to Speech
2+
3+
## [`WatsonSpeech.TextToSpeech`](http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_text-to-speech.html)
4+
5+
### [`.synthesize({text, token||access_token})`](http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_text-to-speech_synthesize.html) -> `<audio>`
6+
7+
Speaks the supplied text through an automatically-created `<audio>` element.
8+
Currently limited to text that can fit within a GET URL (this is particularly an issue on [Internet Explorer before Windows 10](http://stackoverflow.com/questions/32267442/url-length-limitation-of-microsoft-edge)
9+
where the max length is around 1000 characters after the token is accounted for.)
10+
11+
Options:
12+
* text - the text to speak
13+
* url - the Watson Text to Speech API URL (defaults to https://stream.watsonplatform.net/text-to-speech/api)
14+
* voice - the desired playback voice's name - see .getVoices(). Note that the voices are language-specific.
15+
* customization_id - GUID of a custom voice model - omit to use the voice with no customization.
16+
* autoPlay - set to false to prevent the audio from automatically playing
17+
18+
Relies on browser audio support: should work reliably in Chrome and Firefox on desktop and Android. Edge works with a little help. Safari and all iOS browsers do not seem to work yet.

docs/TODO.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Todo
2+
3+
* Further solidify API
4+
* break components into standalone npm modules where it makes sense
5+
* run integration tests on travis (fall back to offline server for pull requests)
6+
* add even more tests
7+
* better cross-browser testing (IE, Safari, mobile browsers - maybe saucelabs?)
8+
* update node-sdk to use current version of this lib's RecognizeStream (and also provide the FormatStream + anything else that might be handy)
9+
* move `result` and `results` events to node wrapper (along with the deprecation notice)
10+
* improve docs
11+
* consider a wrapper to match https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
12+
* support a "hard" stop that prevents any further data events, even for already uploaded audio, ensure timing stream also implements this.
13+
* look for bug where single-word final results may omit word confidence (possibly due to FormatStream?)
14+
* fix bug where TimingStream shows words slightly before they're spoken

speech-to-text/recognize-file.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ var fetch = require('nodeify-fetch'); // like regular fetch, but with an extra m
3636
* (e.g. from a file <input>, a dragdrop target, or an ajax request)
3737
*
3838
* @param {Object} options - Also passed to {MediaElementAudioStream} and to {RecognizeStream}
39+
* @param {String} [options.url='wss://stream.watsonplatform.net/speech-to-text/api'] - Base URL for a service instance
3940
* @param {String} options.token - Auth Token for CF services - see https://github.com/watson-developer-cloud/node-sdk#authorization
4041
* @param {String} options.access_token - IAM Access Token for RC services - see https://github.com/watson-developer-cloud/node-sdk#authorization
4142
* @param {Blob|FileString} options.file - String url or the raw audio data as a Blob or File instance to be transcribed (and optionally played). Playback may not with with Blob or File on mobile Safari.

speech-to-text/recognize-microphone.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ var bitBucket = new Writable({
4646
* @param {Object} options - Also passed to {RecognizeStream}, and {FormatStream} when applicable
4747
* @param {String} options.token - Auth Token for CF services - see https://github.com/watson-developer-cloud/node-sdk#authorization
4848
* @param {String} options.access_token - IAM Access Token for RC services - see https://github.com/watson-developer-cloud/node-sdk#authorization
49+
* @param {String} [options.url='wss://stream.watsonplatform.net/speech-to-text/api'] - Base URL for a service instance
4950
* @param {Boolean} [options.format=true] - pipe the text through a FormatStream which performs light formatting. Also controls smart_formatting option unless explicitly set.
5051
* @param {Boolean} [options.keepMicrophone=false] - keeps an internal reference to the microphone stream to reuse in subsequent calls (prevents multiple permissions dialogs in firefox)
5152
* @param {String|DOMElement} [options.outputElement] pipe the text to a [WriteableElementStream](WritableElementStream.html) targeting the specified element. Also defaults objectMode to true to enable interim results.

0 commit comments

Comments
 (0)