Skip to content

Transcription mode is very unreliable #689

@thirdnull

Description

@thirdnull

Checks

  • I have searched existing issues and discussions
  • I can reproduce this with the latest main or release

Describe the bug

Problem

Transcription mode is currently unreliable and difficult to use. The root cause appears to be the "type-as-you-speak" implementation. Streaming transcription output into the active text field in real time is an extremely hard problem to get right.

Suggested Fix

Replace the live-typing approach with a simpler push-to-talk model:

  1. User presses a hotkey to start recording
  2. User presses it again to stop
  3. The transcribed text is copied to the clipboard and pasted into the active field (restoring the previous clipboard contents afterward)

Prior Art / Reference

SuperWhisper is a macOS app whose entire product is voice transcription, and it uses this exact pattern while deliberately avoiding live typing. Notably, it ships with the same Parakeet model Osaurus uses. If a dedicated transcription app considers live typing out of scope, it's a strong signal that the complexity isn't worth it for a secondary feature here.

Why This Matters

The clipboard approach is simpler to implement, significantly more reliable, and is already a proven UX pattern. The current implementation risks making transcription mode feel broken, which reflects poorly on the product overall. Scoping down to the clipboard model would ship something that actually works.

Steps to reproduce

No response

Provider type

Local model (MLX)

Model name

No response

Osaurus version / commit

Latest

macOS version

26.1

Apple Silicon chip

No response

Xcode version

No response

Logs

Screenshots

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions