Skip to content

Conversation

@brainwavecoder9
Copy link

Audio Update Notes (Fixes Windows, Improves streaming)

Went down a bit of a rabbit hole on this one. The PR is dense but I thought the details could provide some insight.

Although it's working nicely, and seems to be an improvement, I have no idea whether or not you want to go in this direction. Guessing some internal discussions will be necessary to determine how you want to handle this, but figured this could be helpful regardless since it does address some Windows specific issues.

Claude makes me sound smarter than I am, here's the breakdown...


TL;DR

  • Switched playback to callback mode; added prebuffer to smooth out voice streaming.
    Added a small (~120 ms, tunable 80–200 ms) prebuffer to absorb bursty realtime chunks and USB/WASAPI scheduling delays.
  • Explicit mono/float32 across I/O.
  • Known trade-offs: ~120 ms default output latency (tunable), input buffer may grow under heavy load (to be capped in a follow-up).
  • If audio latency slowly increases after network stalls, add FIFO backpressure.

Device Notes

Worth highlighting that I needed @show0000's PR #316 to correctly identify the mic array input/output on Windows. I confirmed that these were not showing up as the default device, which I think was intended, and was a question that @FabienDanieau asked.

This mattered because when I ran the reachy_mini_conversation_app (ty Fabien for posting, super helpful) I didn't get any sound at all.

A clue was when I ran the app in Gradio I could hear it through my laptop audio (my default device). Another clue was that the server wake up/shut down sounds played fine.


How I Got Here

After some exploration I found that play_sound() uses a callback-based stream that works, but push_audio_sample() uses a write-based stream. The .write() method on sounddevice streams is known to have issues on Windows, especially with USB audio devices like the ReSpeaker. Technically .write() can work, but it tends to glitch with small, irregular chunks for cases (like TTS) so for most cases callback is preferred and recommended by PortAudio.

I'm also including tests/test_audio_methods.py which is what I used to confirm callback-based streams work, but write-based streams don't. It's not really necessary to include but might be helpful to confirm the approach on other devices.


What Changed

The fix was to modify start_playing() and push_audio_sample() to use a callback-based approach instead of the blocking .write() method. That worked, but then I found that the play_loop() in src/reachy_mini_conversation_app/console.py is pushing audio frames one at a time as they arrive from OpenAI, which causes choppy playback.
The solution for that was to modify audio_sounddevice.py to buffer a queue.

The current approach adds a small lock-free ring via deque + tail pointer in the output callback so we can drain multiple chunks per callback without blocking.

A carry-over (“tail”) is kept between callbacks so no re-enqueue overhead and fewer gaps. Went with explicit dtype="float32" and channels=1 on both streams to avoid conversions. Added a gentle prebuffer: allow streaming immediately but keep a configurable target_buffer_samples; the callback pads silence only until that watermark is reached, and added underflow/overflow counters in logs. Shape safety is to accept (n,) or (n,1) and ensure mono float32.

Deque operations are atomic in CPython, so the producer/consumer threads are safe. The _queued_samples counter is best-effort only; if we want strict accuracy, we could wrap it in a small lock or compute depth opportunistically.


Miscellaneous

FYI to @alozowski and @RemiFabre to confirm that the reachy_mini_conversation_app works on Windows with nothing but this change.

Also, I wasn't sure how anyone else was managing this, but I noticed that the conversation app had its own local version.
To make keeping track of things a bit easier I referenced my local forked version to make sure I kept in sync with my latest changes:

uv pip install --force-reinstall -e ../../brainwavecollective/reachy_mini

@FabienDanieau
Copy link
Contributor

Thank you for the PR! I’ll test it as soon as possible.

Just a quick note: The files in the tests/ directory are designed to be run using pytest. You can see how this is implemented in our CI here. Locally, we adjust the -m not audio|video|... flag based on what we want to test.
If you’d like to keep your file in its current form, consider moving it to examples/debug/. Eventually, we’ll relocate these files to a more appropriate directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants