|
| 1 | +# bevy-deepgram |
| 2 | + |
| 3 | +This is essentially a tech-demo showing how one could integrate Deepgram Automatic Speech Recognition (ASR) |
| 4 | +and the Bevy game engine. You can control the Bevy icon by saying "up", "down", "left", or "right" to jump |
| 5 | +in that direction. There is an "enemy" which moves back and forth and you can collide with. If you fall |
| 6 | +off the bottom of the screen, you "die" and are "respawned" in the center of the screen, vertically. |
| 7 | + |
| 8 | +As a tech-demo, this is pretty complete, but there are many TODOs noted in the comments in the code. To run, |
| 9 | +set a `DEEPGRAM_API_KEY` environment variable, and simply do: |
| 10 | + |
| 11 | +``` |
| 12 | +cargo run |
| 13 | +``` |
| 14 | + |
| 15 | +If things aren't working with the ASR, it may be because your microphone's audio format is different than the |
| 16 | +hardcoded values. This demo expects 44100 Hz floating point PCM audio coming from the microphone. Dynamically |
| 17 | +choosing the audio format is one of the big TODOs... The game also requires a large 1920x1080 window to work |
| 18 | +correctly - reasonable asset and window scaling is another big TODO - in principle, from the Bevy docs, it |
| 19 | +looks like this should work like in other engines (like Unity/Godot/etc), but I did not get it working yet. |
| 20 | + |
| 21 | +## A Word On Dependencies. |
| 22 | + |
| 23 | +First of all, I found that I needed to install some development libraries that |
| 24 | +I was not expecting: |
| 25 | + |
| 26 | +``` |
| 27 | +sudo apt-get install libasound2-dev libudev-dev |
| 28 | +``` |
| 29 | + |
| 30 | +With that out of the way, these are the main Rust/Cargo dependencies: |
| 31 | + |
| 32 | +* `bevy`: the game engine |
| 33 | +* `heron`: a physics engine and wrapper around `bevy_rapier` providing a simpler API |
| 34 | +* `portaudio`: used for microphone input |
| 35 | +* `tokio_tungstenite`/`tungstenite`: used to connect to Deepgram via websockets |
| 36 | +* `tokio`: used to create an async runtime for the websocket handling |
| 37 | + |
| 38 | +I chose `heron` for the physics engine as it was easier to setup and get working than `bevy_rapier` and felt |
| 39 | +much more intuitive. It has limitations for sure, I see no way to directly apply forces and impulses, |
| 40 | +but this can be effectively achieved by directly modifying velocities and accelerations. Overall, the |
| 41 | +Components `heron` introduces map very well to similar physics engines used in Unity/Godot/etc. |
| 42 | + |
| 43 | +`portaudio` was a clear choice for the microphone input, and there was a nice guide that I followed |
| 44 | +to do this part (the guide is linked in the comments actually). |
| 45 | + |
| 46 | +For the websockets, things got a bit tricky. I did not want to introduce an async runtime, and |
| 47 | +even got a prototype working without one, but it had severe limitations (namely lag and the potential |
| 48 | +to block ASR indefinitely). These limitations stemmed from the fact that doing `socket.read_message()` |
| 49 | +is a blocking call. This bugs me as regular channels (and `crossbeam` channels) have a `try_recv()` |
| 50 | +method which is not blocking, and having similar functionality for vanilla `tungstenite` websockets |
| 51 | +would allow this whole project to work without a need for any async runtime. However, here we are! |
0 commit comments