Push-to-talk dictation for Android.
Phone Whisper lets you speak into most apps without switching keyboards. Tap the floating button, speak, tap again, and your text is inserted into the currently focused text field when the app exposes a standard Android input field.\
It supports:
- Local on-device transcription with sherpa-onnx
- Cloud transcription with OpenAI Whisper
- Optional cleanup with OpenAI to fix punctuation and grammar
If you try it and it genuinely saves you time, consider sponsoring
- I like SwiftKey and want to keep it as keyboard but...
- Most keyboard dictation felt too inaccurate
- Gemini's voice input auto submits your transcription (which is pretty bad) so you can't edit it before sending
- Post processing yields much better results, specially adding a list of keywords and technical terms you often use
- Inserting text into the field you're already using lets you keep editing it like any other draft.
Grab the latest APK from GitHub Releases.
Open it on your phone, install it, then launch the app once to finish setup.
Requires JDK 17 and Android SDK.
git clone https://github.com/kafkasl/phone-whisper.git && cd phone-whisper
make buildAPK output:
app/build/outputs/apk/debug/app-debug.apkIf you use ADB:
make adb-install- A small overlay button floats on screen
- Tap once to start recording
- Tap again to stop
- Audio is transcribed locally or in the cloud
- The text is inserted into the focused text field
- If insertion fails, the text is copied to the clipboard
- Open Phone Whisper
- Grant the audio recording permission
- Enable the Accessibility Service
- Choose your transcription mode:
- Local: download a model in the app
- Cloud: paste your OpenAI API key
Once setup is done, the floating button is ready.
Phone Whisper uses Android Accessibility Service for one narrow reason: to insert dictated text into the currently focused text field across apps.
It does not replace your keyboard. It does not run background automation. It only acts after you explicitly tap the overlay button.
Phone Whisper supports two modes:
- Local mode: audio stays on-device
- Cloud mode: audio is sent directly from your device to OpenAI's transcription API
- Optional cleanup: transcript text is sent directly from your device to OpenAI's chat API
I don't run a backend for this app. In cloud mode, requests go straight from your phone to OpenAI using your own API key.
Full policy: PRIVACY.md
Models are stored in app storage under:
/data/data/com.kafkasl.phonewhisper/files/models/Current catalog:
| Model | Size | Notes |
|---|---|---|
| Parakeet 110M | 100 MB | Best default |
| Whisper Base | 199 MB | Solid baseline |
| Parakeet 0.6B | 465 MB | Best quality |
| Moonshine Tiny | 103 MB | Fastest |
The app downloads and extracts models directly from the sherpa-onnx release archives.
make build # build debug APK
make test # run unit tests
make adb-install # build + install via ADB
make clean # clean build artifactsPhone Whisper works best in apps that use standard Android text fields. Some apps use custom text surfaces or terminal-style views, which may not support direct accessibility paste. When insertion is not possible, Phone Whisper falls back to copying the transcript to the clipboard.
Termux's main terminal area is not a standard Android text field, so direct insertion may not work there.
To use Phone Whisper in Termux:
- Focus Termux
- Swipe the extra keys row (
ESC,CTRL,ALT, arrows, etc.) left or right - Switch to Termux's native text input box
- Dictate there
Once text is inserted into the native input box, Termux sends it to the terminal normally.
- Accessibility permission is required for cross-app insertion
- Some apps may block paste or text injection
- Some apps use custom input surfaces instead of standard Android text fields
- Local models are large
- Cloud mode requires your own OpenAI API key
If Phone Whisper saves you time, you can sponsor the project on GitHub:
Personal project. Do whatever you want with it.