__ __ ___ ___ __ __ ___ __ __ __ ___ __
/__` |__) |__ |__ / ` |__| |__) |__ / ` / \ / _` |\ | | | | / \ |\ |
.__/ | |___ |___ \__, | | | \ |___ \__, \__/ \__> | \| | | | \__/ | \|
Browsers are all latest as of 2018-06-28, except:
- macOS was 10.13.1 (2017-10-31), instead of 10.13.5
- Since Safari does not support Web Speech API, the test matrix remains the same
- Xbox was tested on Insider build (1806) with Kinect sensor connected
- The latest Insider build does not support both WebRTC and Web Speech API, so we suspect the production build also does not support both
Quick grab:
- Web Speech API
- Works on most popular platforms, except iOS. Some requires non-default browser.
- iOS: None of the popular browsers support Web Speech API
- Windows: requires Chrome
- Cognitive Services Speech-to-Text
- Works on default browsers on all popular platforms
- iOS: Chrome and Edge does not support Cognitive Services (WebRTC)
| Platform | OS | Browser | Cognitive Services (WebRTC) | Web Speech API |
|---|---|---|---|---|
| PC | Windows 10 (1803) | Chrome 67.0.3396.99 | Yes | Yes |
| PC | Windows 10 (1803) | Edge 42.17134.1.0 | Yes | No, SpeechRecognition not implemented |
| PC | Windows 10 (1803) | Firefox 61.0 | Yes | No, SpeechRecognition not implemented |
| MacBook Pro | macOS High Sierra 10.13.1 | Chrome 67.0.3396.99 | Yes | Yes |
| MacBook Pro | macOS High Sierra 10.13.1 | Safari 11.0.1 | Yes | No, SpeechRecognition not implemented |
| Apple iPhone X | iOS 11.4 | Chrome 67.0.3396.87 | No, AudioSourceError |
No, SpeechRecognition not implemented |
| Apple iPhone X | iOS 11.4 | Edge 42.2.2.0 | No, AudioSourceError |
No, SpeechRecognition not implemented |
| Apple iPhone X | iOS 11.4 | Safari | Yes | No, SpeechRecognition not implemented |
| Apple iPod (6th gen) | iOS 11.4 | Chrome 67.0.3396.87 | No, AudioSourceError |
No, SpeechRecognition not implemented |
| Apple iPod (6th gen) | iOS 11.4 | Edge 42.2.2.0 | No, AudioSourceError |
No, SpeechRecognition not implemented |
| Apple iPod (6th gen) | iOS 11.4 | Safari | No, AudioSourceError |
No, SpeechRecognition not implemented |
| Google Pixel 2 | Android 8.1.0 | Chrome 67.0.3396.87 | Yes | Yes |
| Google Pixel 2 | Android 8.1.0 | Edge 42.0.0.2057 | Yes | Yes |
| Google Pixel 2 | Android 8.1.0 | Firefox 60.1.0 | Yes | Yes |
| Microsoft Lumia 950 | Windows 10 (1709) | Edge 40.15254.489.0 | No, AudioSourceError |
No, SpeechRecognition not implemented |
| Microsoft Xbox One | Windows 10 (1806) 17134.4054 | Edge 42.17134.4054.0 | No, AudioSourceError |
No, SpeechRecognition not implemented |
Interactive mode means continuous is set to false. In Cognitive Services Speech Services SDK, this translate to recognizeOnceAsync.
Continuous mode means continuous is set to true, which is startContinuousRecognitionAsync in Cognitive Services Speech SDK.
- Interactive mode (with interim results)
- W3C Web Speech API
startaudiostartsoundstartspeechstart- One or more
resultevents, ifinterimResultsis set totrue speechendsoundendaudioendresultresults === [{ isFinal: true }]
end
- Cognitive Services Speech Services
- Call
recognizeOnceAsync() - Receive zero or more
recognizingevent- With notable text in
result.text result.jsonis similar to{"Text":"text","Offset":200000,"Duration":32400000}
- With notable text in
- Receive a final
recognizedeventresult.jsonis similar to{"RecognitionStatus":"Success","Offset":1800000,"Duration":48100000,"NBest":[{"Confidence":0.2331869,"Lexical":"no","ITN":"no","MaskedITN":"no","Display":"No."}]}
onSuccess(result)callback fromrecognizeOnceAsync()resultis similar to or same as theevent.resultobject received fromrecognized(event)
- Call
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
startaudiostartsoundstartspeechstart- One or more
results, ifinterimResultsis set totrueresults === [{ isFinal: true }, { isFinal: true }]- All with
isFinal === true
- (When
stop()is called) speechendsoundendaudioendend
- Cognitive Services Speech Services
- TBD
CallstartContinuousRecognitionAsync()ReceivestarteventReceive multiplerecognizingevent❗ When speaking slowly with significant delay between sentences, the SDK is only able to recognize first sentence
CallstopContinuousRecognitionAsync()Observed microphone stop recording
Receivestopevent
- W3C Web Speech API
stop() is a supported feature in Web Speech API for push-to-talk operation.
❗ Cognitive Services does not support push-to-talk natively, we are trying to mimic the behavior by hiding the output after stop() is called.
- We are taking the latest interim results as the final results
- Lexical ("one two three") does not get converted into ITN ("123") for interim results
- Cognitive Services does not return confidence for interims, thus, we will assume it is
0.5
- Microphone will not stop recording immediately
- Interactive mode (with interim results)
- W3C Web Speech API
startaudiostart- Optional,
soundstart - Optional,
speechstart - Optional,
speechend - Optional,
soundend audioendend
- Cognitive Services
recognizeOnceAsyncdoes not support stop or cancellation, thus, we need to mimic the behavior by ignoring somerecognizingand the finalrecognizedevent
- Call
recognizeOnceAsync() - (
stop()is called) - Receive a final
recognizedevent onSuccess(result)callback fromrecognizeOnceAsync()
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
- TBD
- Cognitive Services
- TBD
- W3C Web Speech API
- Interactive mode (with interim results)
- W3C Web Speech API
startaudiostartsoundstartspeechstart- One or more
resultevents, ifinterimResultsis set totrue speechendsoundendaudioend- ❓ One or more
resultwithresults === [{ isFinal: false }] resultresults === [{ isFinal: true }]
end
- Cognitive Services
recognizeOnceAsyncdoes not support stop or cancellation, thus, we need to mimic the behavior by ignoring somerecognizingand the finalrecognizedevent
- Call
recognizeOnceAsync() - Receive zero or more
recognizingevent- With notable text in
result.text result.jsonis similar to{"Text":"text","Offset":200000,"Duration":32400000}
- With notable text in
- Receive a final
recognizedeventresult.jsonis similar to{"RecognitionStatus":"Success","Offset":1800000,"Duration":48100000,"NBest":[{"Confidence":0.2331869,"Lexical":"no","ITN":"no","MaskedITN":"no","Display":"No."}]}
onSuccess(result)callback fromrecognizeOnceAsync()
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
- TBD
- Cognitive Services
- TBD
- W3C Web Speech API
- Interactive mode (with interim results)
- W3C Web Speech API
startaudiostartaudioenderrorerror === 'aborted'
end
- Cognitive Services
- There is no
abort()equivalent forrecognizeOnceAsync(), thus, microphone will not stop recording immediately
- There is no
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
- TBD
- Cognitive Services
- TBD
- W3C Web Speech API
- Interactive mode
- W3C Web Speech API
startaudiostartsoundstartspeechstart- One or more
resultevents, ifinterimResultsis set totrue speechendsoundendaudioenderrorerror === 'aborted'
end
- Cognitive Services
- TBD
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
- TBD
- Cognitive Services
- TBD
- W3C Web Speech API
Turn on airplane mode.
- Interactive mode
- W3C Web Speech API
startaudiostartaudioenderrorerror === 'network'
end
- Cognitive Services Speech Services
- Received
canceledeventerrorDetails === 'Unable to contact server. StatusCode: 1006, Reason: '
errorcallback is receivederrorDetails === 'Unable to contact server. StatusCode: 1006, Reason: '
- (Microphone was not turned on, or too short to detect if it has turned on)
- Received
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
- TBD
- Cognitive Services Speech Services
- TBD
- W3C Web Speech API
Since browser speech does not requires subscription key, we assume this flow should be same as airplane mode.
- Interactive mode
- W3C Web Speech API
startaudiostartaudioenderrorerror === 'network'
end
- Cognitive Services Speech Services
- Console (on Chrome) logged
WebSocket connection to 'wss://westus.stt.speech.microsoft.com/speech/recognition/interactive/cognitiveservices/v1?language=en-US&format=detailed&Ocp-Apim-Subscription-Key=...&X-ConnectionId=...' failed: HTTP Authentication failed; no valid credentials available. - Received
canceledeventerrorDetails === 'Unable to contact server. StatusCode: 1006, Reason: 'reason === 0
errorcallback is receivederrorDetails === 'Unable to contact server. StatusCode: 1006, Reason: '
- (Microphone was not turned on, or too short to detect if it has turned on)
- Console (on Chrome) logged
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
- TBD
- Cognitive Services Speech Services
- TBD
- W3C Web Speech API
Microphone is muted and record level is at zero. This should be distinguishable by missing of soundstart event on Web Speech API.
- Interactive mode
- W3C Web Speech API
startaudiostartaudioenderrorerror === 'no-speech'
end
- Cognitive Services Speech Services
- After 5 seconds of silence,
recognizedresult.json.RecognitionStatus === 'InitialSilenceTimeout'result.offset === 50000000- Microphone is off after this event
- After 5 seconds of silence,
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
startaudiostartaudioenderrorerror === 'no-speech'- Even in continuous mode, browser will timeout with
no-speechafter 5 seconds
end
- Cognitive Services Speech Services
startAfter 15 seconds of silence,recognizedjson.RecognitionStatus === 'InitialSilenceTimeout'offset === 150000000
(Whenstop()),stop
- W3C Web Speech API
Some sounds are heard, but they cannot be recognized as text. There could be some interim results with recognized text, but the confidence is so low it dropped out of final result.
- Interactive mode
- W3C Web Speech API
startaudiostartsoundstartspeechstartspeechendsoundendaudioendend
- Cognitive Services Speech Services
- TBD
After 5 seconds of unrecognizable sound,recognizedjson.RecognitionStatus === 'InitialSilenceTimeout'offset === 50000000Microphone is off after this event
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
startaudiostartsoundstartspeechstart- (When
stop()) speechendsoundendaudioendend
- Cognitive Services Speech Services
startAfter 15 seconds of unrecognizable sound,recognizedjson.RecognitionStatus === 'InitialSilenceTimeout'offset === 150000000
(Whenstop())stop
- W3C Web Speech API
- Interactive mode
- W3C Web Speech API
- (No
startevent was received) errorerror === 'not-allowed'
end
- (No
- Cognitive Services Speech Services
recognizeOnceAsync(success, error)returned witherrorcallback"Runtime error: 'Error handler for error Error occurred during microphone initialization: NotAllowedError: Permission denied threw error Error: Error occurred during microphone initialization: NotAllowedError: Permission denied'"
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
errorerror === 'not-allowed'
end
- Cognitive Services Speech Services
- TBD
- W3C Web Speech API