general idea

General idea

The idea is to combine speech recognition and image processing:

speech recognition: processes audio and tries to guess the pronounced words based on perceived frequencies. We have decent performance with convolutional NNs combined with a RNN for classification, but it's not robust when there's background noise.
image processing: process a video/sequence of images and try to guess the pronounced words based on mouth movements. We have okay performance with this, but it can be used to complement audio processing as it doesn't care about background noise. Uses more bandwidth and processing power though

=> use speech because lower energy cost, but when performance goes down b/c of noise, mix in some image processing to increase robustness to bad audio.

If this works reasonably well, we can start mapping it to dedicated HW.

Home
general idea
software used
links
work-overview
thesis-conversations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

general idea

General idea

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally