Refactor SoundDevice audio playback to callback mode with prebuffer smoothing (fix choppy output on Windows/ReSpeaker) #362
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Audio Update Notes (Fixes Windows, Improves streaming)
Went down a bit of a rabbit hole on this one. The PR is dense but I thought the details could provide some insight.
Although it's working nicely, and seems to be an improvement, I have no idea whether or not you want to go in this direction. Guessing some internal discussions will be necessary to determine how you want to handle this, but figured this could be helpful regardless since it does address some Windows specific issues.
Claude makes me sound smarter than I am, here's the breakdown...
TL;DR
Added a small (~120 ms, tunable 80–200 ms) prebuffer to absorb bursty realtime chunks and USB/WASAPI scheduling delays.
Device Notes
Worth highlighting that I needed @show0000's PR #316 to correctly identify the mic array input/output on Windows. I confirmed that these were not showing up as the default device, which I think was intended, and was a question that @FabienDanieau asked.
This mattered because when I ran the reachy_mini_conversation_app (ty Fabien for posting, super helpful) I didn't get any sound at all.
A clue was when I ran the app in Gradio I could hear it through my laptop audio (my default device). Another clue was that the server wake up/shut down sounds played fine.
How I Got Here
After some exploration I found that
play_sound()uses a callback-based stream that works, butpush_audio_sample()uses a write-based stream. The.write()method on sounddevice streams is known to have issues on Windows, especially with USB audio devices like the ReSpeaker. Technically.write()can work, but it tends to glitch with small, irregular chunks for cases (like TTS) so for most cases callback is preferred and recommended by PortAudio.I'm also including
tests/test_audio_methods.pywhich is what I used to confirm callback-based streams work, but write-based streams don't. It's not really necessary to include but might be helpful to confirm the approach on other devices.What Changed
The fix was to modify
start_playing()andpush_audio_sample()to use a callback-based approach instead of the blocking.write()method. That worked, but then I found that theplay_loop()insrc/reachy_mini_conversation_app/console.pyis pushing audio frames one at a time as they arrive from OpenAI, which causes choppy playback.The solution for that was to modify
audio_sounddevice.pyto buffer a queue.The current approach adds a small lock-free ring via deque + tail pointer in the output callback so we can drain multiple chunks per callback without blocking.
A carry-over (“tail”) is kept between callbacks so no re-enqueue overhead and fewer gaps. Went with explicit
dtype="float32"andchannels=1on both streams to avoid conversions. Added a gentle prebuffer: allow streaming immediately but keep a configurabletarget_buffer_samples; the callback pads silence only until that watermark is reached, and added underflow/overflow counters in logs. Shape safety is to accept(n,)or(n,1)and ensure mono float32.Deque operations are atomic in CPython, so the producer/consumer threads are safe. The
_queued_samplescounter is best-effort only; if we want strict accuracy, we could wrap it in a small lock or compute depth opportunistically.Miscellaneous
FYI to @alozowski and @RemiFabre to confirm that the
reachy_mini_conversation_appworks on Windows with nothing but this change.Also, I wasn't sure how anyone else was managing this, but I noticed that the conversation app had its own local version.
To make keeping track of things a bit easier I referenced my local forked version to make sure I kept in sync with my latest changes: