-
Notifications
You must be signed in to change notification settings - Fork 202
Transcription mode is very unreliable #689
Description
Checks
- I have searched existing issues and discussions
- I can reproduce this with the latest
mainor release
Describe the bug
Problem
Transcription mode is currently unreliable and difficult to use. The root cause appears to be the "type-as-you-speak" implementation. Streaming transcription output into the active text field in real time is an extremely hard problem to get right.
Suggested Fix
Replace the live-typing approach with a simpler push-to-talk model:
- User presses a hotkey to start recording
- User presses it again to stop
- The transcribed text is copied to the clipboard and pasted into the active field (restoring the previous clipboard contents afterward)
Prior Art / Reference
SuperWhisper is a macOS app whose entire product is voice transcription, and it uses this exact pattern while deliberately avoiding live typing. Notably, it ships with the same Parakeet model Osaurus uses. If a dedicated transcription app considers live typing out of scope, it's a strong signal that the complexity isn't worth it for a secondary feature here.
Why This Matters
The clipboard approach is simpler to implement, significantly more reliable, and is already a proven UX pattern. The current implementation risks making transcription mode feel broken, which reflects poorly on the product overall. Scoping down to the clipboard model would ship something that actually works.
Steps to reproduce
No response
Provider type
Local model (MLX)
Model name
No response
Osaurus version / commit
Latest
macOS version
26.1
Apple Silicon chip
No response
Xcode version
No response
Logs
Screenshots
No response