You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: speech-to-text/recognize-stream.js
+19-68Lines changed: 19 additions & 68 deletions
Original file line number
Diff line number
Diff line change
@@ -49,85 +49,20 @@ var QUERY_PARAMS_ALLOWED = [
49
49
50
50
51
51
/**
52
-
* pipe()-able Node.js Readable/Writeable stream - accepts binary audio and emits text/objects in it's `data` events.
52
+
* pipe()-able Node.js Duplex stream - accepts binary audio and emits text/objects in it's `data` events.
53
53
*
54
54
* Uses WebSockets under the hood. For audio with no recognizable speech, no `data` events are emitted.
55
55
*
56
56
* By default, only finalized text is emitted in the data events, however when `objectMode`/`readableObjectMode` and `interim_results` are enabled, both interim and final results objects are emitted.
57
57
* WriteableElementStream uses this, for example, to live-update the DOM with word-by-word transcriptions.
58
58
*
59
-
* An interim result looks like this:
60
-
```js
61
-
{ alternatives:
62
-
[ { timestamps:
63
-
[ [ 'it', 20.9, 21.04 ],
64
-
[ 'is', 21.04, 21.17 ],
65
-
[ 'a', 21.17, 21.25 ],
66
-
[ 'site', 21.25, 21.56 ],
67
-
[ 'that', 21.56, 21.7 ],
68
-
[ 'hardly', 21.7, 22.06 ],
69
-
[ 'anyone', 22.06, 22.49 ],
70
-
[ 'can', 22.49, 22.67 ],
71
-
[ 'behold', 22.67, 23.13 ],
72
-
[ 'without', 23.13, 23.46 ],
73
-
[ 'some', 23.46, 23.67 ],
74
-
[ 'sort', 23.67, 23.91 ],
75
-
[ 'of', 23.91, 24 ],
76
-
[ 'unwanted', 24, 24.58 ],
77
-
[ 'emotion', 24.58, 25.1 ] ],
78
-
transcript: 'it is a site that hardly anyone can behold without some sort of unwanted emotion ' } ],
79
-
final: false,
80
-
result_index: 3 }
81
-
```
82
-
83
-
While a final result looks like this (some features only appear in final results):
84
-
```js
85
-
{ alternatives:
86
-
[ { word_confidence:
87
-
[ [ 'it', 1 ],
88
-
[ 'is', 0.956286624429304 ],
89
-
[ 'a', 0.8105753725270362 ],
90
-
[ 'site', 1 ],
91
-
[ 'that', 1 ],
92
-
[ 'hardly', 1 ],
93
-
[ 'anyone', 1 ],
94
-
[ 'can', 1 ],
95
-
[ 'behold', 0.5273598005406737 ],
96
-
[ 'without', 1 ],
97
-
[ 'some', 1 ],
98
-
[ 'sort', 1 ],
99
-
[ 'of', 1 ],
100
-
[ 'unwanted', 1 ],
101
-
[ 'emotion', 0.49401837076320887 ] ],
102
-
confidence: 0.881,
103
-
transcript: 'it is a site that hardly anyone can behold without some sort of unwanted emotion ',
104
-
timestamps:
105
-
[ [ 'it', 20.9, 21.04 ],
106
-
[ 'is', 21.04, 21.17 ],
107
-
[ 'a', 21.17, 21.25 ],
108
-
[ 'site', 21.25, 21.56 ],
109
-
[ 'that', 21.56, 21.7 ],
110
-
[ 'hardly', 21.7, 22.06 ],
111
-
[ 'anyone', 22.06, 22.49 ],
112
-
[ 'can', 22.49, 22.67 ],
113
-
[ 'behold', 22.67, 23.13 ],
114
-
[ 'without', 23.13, 23.46 ],
115
-
[ 'some', 23.46, 23.67 ],
116
-
[ 'sort', 23.67, 23.91 ],
117
-
[ 'of', 23.91, 24 ],
118
-
[ 'unwanted', 24, 24.58 ],
119
-
[ 'emotion', 24.58, 25.1 ] ] },
120
-
{ transcript: 'it is a sight that hardly anyone can behold without some sort of unwanted emotion ' },
121
-
{ transcript: 'it is a site that hardly anyone can behold without some sort of unwanted emotions ' } ],
122
-
final: true,
123
-
result_index: 3 }
124
-
```
125
-
59
+
* Note that the WebSocket connection is not established until the first chunk of data is recieved. This allows for auto-detection of content type (for wav/flac/opus audio).
126
60
*
127
61
* @param {Object} options
128
62
* @param {String} [options.model='en-US_BroadbandModel'] - voice model to use. Microphone streaming only supports broadband models.
129
63
* @param {String} [options.url='wss://stream.watsonplatform.net/speech-to-text/api'] base URL for service
130
64
* @param {String} [options.token] - Auth token
65
+
* @param {Object} [options.headers] - Only works in Node.js, not in browsers. Allows for custom headers to be set, including an Authorization header (preventing the need for auth tokens)
131
66
* @param {String} [options.content-type='audio/wav'] - content type of audio; can be automatically determined from file header in most cases. only wav, flac, and ogg/opus are supported
132
67
* @param {Boolean} [options.interim_results=true] - Send back non-final previews of each "sentence" as it is being processed. These results are ignored in text mode.
133
68
* @param {Boolean} [options.continuous=true] - set to false to automatically stop the transcription after the first "sentence"
0 commit comments