Added feature of transcription of local files (WAV, MP3 and M4A) along with progressbar #381

Signal46 · 2025-11-25T17:19:40Z

Hello! This is my first contribution to an open-source project on GitHub. I’m a big fan of Handy and saw a need for a feature to handle long-form recordings safely.

I realized after working on this that there is an existing PR (#371) for file uploads, but I thought I would make this contribution anyway since this has a progress bar.

Motivation & Context
There is a significant need for this feature in the public sector, specifically for municipalities and government agencies (e.g., here in Sweden).

Compliance & GDPR: Many organizations cannot use cloud-based transcription (which often costs ~$150/user/month) due to strict data compliance laws regarding sensitive meetings.

Local Processing: By keeping the file and transcription 100% local, Handy solves major legal hurdles regarding data transfer.

Efficiency: Estimates in a small Swedish municipality suggest transcription (including sensitive meetings) could save around 5000 hours administrative hours annually. Yes, 5000 hours, not 500... for a municipality of around 60,000 citizens.

Technical Details
Implementation: This feature was built with the assistance of AI coding agents but has been manually reviewed and tested.

Testing: Validated on a standard work laptop using the Parakeet V3 model. Successfully transcribed 20 and 40-minute files without memory spikes or crashes.

I am happy to make changes or discuss how this might be merged or combined with existing efforts!

Summary

Adds the ability to import existing audio files (MP3, M4A, WAV) into Handy's history with automatic transcription. This feature includes real-time progress tracking, system notifications, and robust handling of long audio files using smart chunking (VAD).

Successfully transcribed files are automatically moved to the "Recordings" folder for organization.

Motivation
Users requested the ability to transcribe existing audio files, not just live recordings. This feature enables batch processing of pre-recorded audio while maintaining the same quality and privacy guarantees as live transcription.

Changes
Core Features
File Import: Native file picker for selecting MP3, M4A, and WAV files.
Smart Chunking: Uses Voice Activity Detection (VAD) to split audio on silence, preventing words from being cut in half and improving transcription quality.
Progress Tracking: Real-time progress bar showing transcription progress (0-100%).
System Notifications: Desktop notifications on completion or failure.
Long File Support: Prevents crashes on files >20 minutes by processing in chunks.
Auto-Model Loading: Automatically loads the transcription model if it's not currently loaded.
File Management: Moves imported files to the
recordings
folder after successful transcription.
Technical Implementation
Backend (
src-tauri/
)

Audio Processing: Added
decode_and_resample()
using symphonia for multi-format support.
Smart Chunking: Integrated
SileroVad
to detect silence windows around the target chunk size (30s) in
TranscriptionManager
.
Import Workflow: New import_audio_file command handles decoding, resampling, smart chunking, transcription, and file management.
Events: Emits import-status and transcription-progress events for UI updates.
Database: Updated HistoryManager to store audio duration.

Frontend (src/)
UI: Added "Import Audio File" button to History settings.
Feedback: Implemented a progress bar component with percentage display.
Integration: Added event listeners for status updates and system notifications.

Dependencies
Backend: symphonia, tauri-plugin-notification, vad-rs
Frontend: @tauri-apps/plugin-notification

Testing
Tested on Windows laptop with:

✅ Short files (<1 min)
✅ Medium files (5-10 min)
✅ Long files (20+ min) - previously crashed, now works
✅ All supported formats (MP3, M4A, WAV)
✅ Model auto-loading and recovery if unloaded during transcription of long files
✅ Progress bar
✅ Notifications (success/failure)

Screenshots

…ling, and WAV file saving utilities.

…transcription, and history management.

…Tauri backend commands and UI components.

… and progress bar for transcription of local files.

Signal46 · 2025-11-25T17:21:17Z

Relates to #299

olejsc · 2025-11-25T18:07:03Z

I notice this implements the actual storage of the file. I think supporting both scenarios (link to file path + actually copy the file to the recordings folder) would be nice, but maybe as a option the user can choose to copy or link "per file" chosen to transcribe (or, as a general option) ?

Its also nice with the support for large files - I didn't really try any long transcriptions in the other PR.

How does the 30 second batch work if it "clips" sentences/words in the middle?

On another note, I like how similar we've been thinking about where the "upload file" should be 😄

Signal46 · 2025-11-26T08:27:34Z

Great question @olejsc ! I hadnt thought about that.
I implemented smart chunking using VAD which will look for silence around the 30 second mark now and make the cut. It looks like the transcription was of a lot higher quality now after the change, I assumed it was simply because I was transcribing in Swedish before that the quality was low.

Regarding the file location, I was considering a future feature of automatically deleting these files because of privacy / compliance reasons but I'm not sure what is the best way of doing it.

Regarding the UI alignment: yes that quite suprising! I was thinking it would be the place with the least amount of changes to the original code.

cjpais · 2025-11-26T10:37:01Z

@Signal46 im going to review this today or tomorrow, but if you don't mind, I would prefer transcribe-rs having the chunking done there if possible. Would streamline the code and be useful for more people that way I think

Happy to review a PR there for this too

- Removed local VAD implementation and manual chunking loop from [TranscriptionManager](cci:2://file:///c:/Handy_MeetingTranscription/src-tauri/src/managers/transcription.rs:35:0-45:1). - Updated [transcribe()](cci:1://file:///c:/Handy_MeetingTranscription/src-tauri/src/managers/transcription.rs:304:4-440:5) to call the new `transcribe_with_smart_chunking` API from the `transcribe-rs` library. - Updated [Cargo.toml](cci:7://file:///c:/Handy_MeetingTranscription/src-tauri/Cargo.toml:0:0-0:0) to point to the local `transcribe-rs` dependency for development. This change streamlines the application code by moving the complex silence-detection logic into the underlying transcription library, as requested.

Restore progress callback in transcribe_with_smart_chunking calls (enables progress bar updates) Fix import workflow to save decoded audio as new WAV file (prevents "file not found" errors) Make source file deletion non-fatal

Signal46 · 2025-11-27T13:50:36Z

@cjpais thanks! I finished moving the chunking to transcribe-rs and made a pull request for that as well, thanks for your feedback. This is tested again and ready for review now.

cjpais · 2025-11-28T00:50:01Z

Thank you so much for doing that. This is definitely gonna take me longer than I expected to review just because it's a major enough feature. And I think overall this PR looks good to me, but I do need to take some time and spend some time with it before I fully pull it in. I'm currently traveling and not in a consistent place, so when I do get to review this, I want to just spend a few solid hours with it. Uh and I expect that it'll take me at least a few solid hours to review uh before pulling in. So just give me a couple weeks, I think. But let's get this in. I just wanna give an update that I am thinking about it.

@Signal46 if you don't mind, maybe a second PR for transcribe-rs... hahah... to move the decode and resample bits there as well, so it natively can support many more file types? I think it would also simplify the code here which would be nice. This one is a bit smaller of a change to the library because I believe it already supports transcribing files directly, so all of the decoding and resampling could happen transparently I think

I know @olejsc you have been working on #371 as well which is quite similar, but I think maybe I slightly prefer the UI here

Signal46 · 2025-11-28T10:06:12Z

@cjpais Yeah that sounds good, no need to rush! 👍

I will work on the moving the decode and resample bits to transcribe-rs as well but i think it will take a few days, maybe around tuesday.

olejsc · 2025-11-28T12:06:37Z

one thing that was in PR #371 was a distinct icon for each history for if a entry was a recording (by the user) or a uploaded file. Is this relevant here ? I personally liked to have that distinction, but must admit it doesn't really provide any huge value. @cjpais
It used a microphone icon (🎙️) for a recording entry, and some document icon (📃) for the uploaded file entries.

- Update audio import to use `transcribe-rs`'s new decoding logic - Add `source` field to history entries to track "recording" vs "upload" - Add database migration for `source` column - Update history UI to display 🎙️ for recordings and 📃 for uploads

Signal46 · 2025-12-01T09:00:49Z

I moved the decode and resampling to transcribe-rs and committed to the existing pull request cjpais/transcribe-rs#14

Also added a microphone icon and document icon for the uploaded file entries in Handy.

genesis-gh-ggarrett · 2025-12-12T14:50:43Z

this would be awesome; also to be able to record within the app itself and start / pause / stop a recording

JasonAppo · 2026-01-11T16:55:01Z

Hi Signal46,
I think this would be awesome and I have been waiting for a way to process long audio files for a similar usage that require privacy (medical interviews). I tried to build your version on my pc but it failed, could you please release an exe or tell me how to process so that I can try your version ? Thank you for your time !

Signal46 · 2026-01-11T20:12:54Z

@JasonAppo whats the error that you get when building it? Have you checked the instructions in BUILD.md?

@cjpais Just wanted to follow up on this since it's been about 6 weeks, Seems like theres still active interest in this feature.

JasonAppo · 2026-01-12T18:15:58Z

Hi and thank you for your reply, here is the error i got at the end of Building

error[E0432]: unresolved import transcribe_rs::audio::decode_and_resample
--> src\commands\import.rs:10:5
|
10 | use transcribe_rs::audio::decode_and_resample;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no decode_and_resample in audio

warning: unused import: symphonia::core::sample::Sample
--> src\audio_toolkit\audio\utils.rs:15:5
|
15 | use symphonia::core::sample::Sample;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: #[warn(unused_imports)] (part of #[warn(unused)]) on by default

warning: unused import: error
--> src\commands\import.rs:5:18
|
5 | use log::{debug, error, info};
| ^^^^^

warning: unused import: crate::actions::ACTION_MAP
--> src\signal_handle.rs:1:5
|
1 | use crate::actions::ACTION_MAP;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: unused import: crate::ManagedToggleState
--> src\signal_handle.rs:2:5
|
2 | use crate::ManagedToggleState;
| ^^^^^^^^^^^^^^^^^^^^^^^^^

warning: unused imports: debug, info, and warn
--> src\signal_handle.rs:3:11
|
3 | use log::{debug, info, warn};
| ^^^^^ ^^^^ ^^^^

warning: unused import: std::thread
--> src\signal_handle.rs:4:5
|
4 | use std::thread;
| ^^^^^^^^^^^

warning: unused imports: AppHandle and Manager
--> src\signal_handle.rs:5:13
|
5 | use tauri::{AppHandle, Manager};
| ^^^^^^^^^ ^^^^^^^

error[E0599]: no method named transcribe_with_smart_chunking found for mutable reference &mut WhisperEngine in the current scope
--> src\managers\transcription.rs:392:26
|
391 | / whisper_engine
392 | | .transcribe_with_smart_chunking(
| | -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ method not found in &mut WhisperEngine
| |_________________________|
|

error[E0599]: no method named transcribe_with_smart_chunking found for mutable reference &mut ParakeetEngine in the current scope
--> src\managers\transcription.rs:407:26
|
406 | / parakeet_engine
407 | | .transcribe_with_smart_chunking(
| | -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ method not found in &mut ParakeetEngine
| |_________________________|
|
Some errors have detailed explanations: E0432, E0599.
For more information about an error, try rustc --explain E0432.
warning: handy (lib) generated 7 warnings
error: could not compile handy (lib) due to 3 previous errors; 7 warnings emitted

cjpais · 2026-01-13T08:54:30Z

@Signal46 this is on my list for 1.0.0 but I am very very busy right now. I really would like to get this in soon, but I know it's going to take some time.

My biggest priority for the moment is a rewrite of the keyboard + adding parakeet streaming ideally. After that this will be high on my list as well as the text replacements.

Thanks for the patience

Signal46 · 2026-01-14T12:24:57Z

@JasonAppo I created a release for you: https://github.com/Signal46/Handy_MeetingTranscription/releases/tag/Meeting-Transcription
Hope that works for you!

@cjpais No worries, I'm very glad to hear its still something that could be valuable!

jorgedanisc · 2026-01-14T22:54:34Z

I was testing the fork from @Signal46 on my macOS (looks good overall) and also recognized these UX problems from my experience with the app (from someone who keeps a recording long history without deleting)

The History list should be virtualized, because for some people (my experience) it can come time when this list is so huge that the app gets really slow when going to the History tab

From the experience of using the meeting-transcription fork:

For items with really long text on the history list, it should have a line limit where it collapses - possibly adding a "show more" button to uncollapse
The progress bar didn't work for me, on macos
It would be cool to also make the app's tray icon to have the progress of the transcription
What happens if the user starts a transcription while an uploaded file that is actively transcribing? I can see this as a common scenario to be problematic because usually uploaded file can be long recordings, therefore in the meantime the user could need to use the usual shortcut transcribe action - and in this moment it would collide, so perhaps having proper parallelization here could solve it

JasonAppo · 2026-01-15T06:53:45Z

Thank a lot to everyone involved in this project and thanks Signal46 for taking this time, I tried your version and could transcribe a 2 hours long session in 5 min, this is so usefull !

Signal46 added 4 commits November 25, 2025 14:58

feat: Add initial Tauri project structure with audio decoding, resamp…

53cef5c

…ling, and WAV file saving utilities.

feat: add audio file import functionality with decoding, resampling, …

4e72d6a

…transcription, and history management.

feat: Add history management and audio import functionality with new …

111f02c

…Tauri backend commands and UI components.

feat: add transcription manager with model loading/unloading handling…

e88977a

… and progress bar for transcription of local files.

feature: implemented smart chunking using VAD

6252741

Signal46 mentioned this pull request Nov 27, 2025

feat: implement smart chunking with custom Silero VAD cjpais/transcribe-rs#14

Open

fix: Add progress callback and improve import reliability

d6e5981

Restore progress callback in transcribe_with_smart_chunking calls (enables progress bar updates) Fix import workflow to save decoded audio as new WAV file (prevents "file not found" errors) Make source file deletion non-fatal

cjpais mentioned this pull request Nov 28, 2025

feat: Add transcription of audio file #371

Closed

Uh oh!

Added feature of transcription of local files (WAV, MP3 and M4A) along with progressbar #381

Are you sure you want to change the base?

Added feature of transcription of local files (WAV, MP3 and M4A) along with progressbar #381

Uh oh!

Conversation

Signal46 commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Signal46 commented Nov 25, 2025

Uh oh!

olejsc commented Nov 25, 2025

Uh oh!

Signal46 commented Nov 26, 2025

Uh oh!

cjpais commented Nov 26, 2025

Uh oh!

Signal46 commented Nov 27, 2025

Uh oh!

cjpais commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Signal46 commented Nov 28, 2025

Uh oh!

olejsc commented Nov 28, 2025

Uh oh!

Signal46 commented Dec 1, 2025

Uh oh!

genesis-gh-ggarrett commented Dec 12, 2025

Uh oh!

JasonAppo commented Jan 11, 2026

Uh oh!

Signal46 commented Jan 11, 2026

Uh oh!

JasonAppo commented Jan 12, 2026

Uh oh!

cjpais commented Jan 13, 2026

Uh oh!

Signal46 commented Jan 14, 2026

Uh oh!

jorgedanisc commented Jan 14, 2026

Uh oh!

JasonAppo commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Signal46 commented Nov 25, 2025 •

edited

Loading

cjpais commented Nov 28, 2025 •

edited

Loading