Skip to content

Conversation

@rohansjoshi
Copy link
Contributor

Demo app showcasing Whisper running on-device

Both audio processing (STFT+Mel filterbank) and the actual Whisper model are exported to .pte files with ExecuTorch, and run in the app

The app does the following:

  1. Press button, record audio
  2. Save input as single-channel PCM
  3. Convert and save to .wav file (manually write .wav header), with 2 bytes per sample
  4. Open .wav file, read byte array two bytes at a time and convert to float array (little endian is convention for .wav)
  5. Convert float array to Tensor using ET Tensor binding, pass it through Module which wraps audio processing .pte
  6. Since the Qualcomm Whisper Runner reads raw bytes as input (maybe this should be changed), we convert the output to a byte array. Make sure it is in little endian order (this is the runner's convention)
  7. Pass array into WhisperModule which wraps the runner which runs the actual Whisper model .pte (encoder+decoder)

To build the app, you need to

  1. Export both .pte files using scripts in ExecuTorch. For audio preprocessing run extension/audio/mel_spectrogram. For QNN whisper, run the script examples/qualcomm/oss_scripts/whisper. Move both pte files to /data/local/tmp/whisper on device.
  2. Checkout Whisper JNI bindings PR, build the Executorch Android library with Qualcomm backend, and save executorch.aar
  3. Copy executorch.aar into app/libs
  4. Build the app in Android studio

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 22, 2025

private fun runWhisper() {
// The entire audio flow:
val wavFile = File(getExternalFilesDir(null), "audio_record.wav") // do this better
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we bundle the wav file with the apk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the writeWavFile function the wav file is written, runWhisper runs afterward assuming it already exists

@rohansjoshi rohansjoshi merged commit c540332 into main Aug 25, 2025
1 check passed
@rohansjoshi rohansjoshi deleted the whisper-demo branch August 25, 2025 15:31
@mergennachin mergennachin mentioned this pull request Oct 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants