Skip to content

Conversation

@Signal46
Copy link

@Signal46 Signal46 commented Nov 25, 2025

Hello! This is my first contribution to an open-source project on GitHub. I’m a big fan of Handy and saw a need for a feature to handle long-form recordings safely.

I realized after working on this that there is an existing PR (#371) for file uploads, but I thought I would make this contribution anyway since this has a progress bar.

Motivation & Context
There is a significant need for this feature in the public sector, specifically for municipalities and government agencies (e.g., here in Sweden).

Compliance & GDPR: Many organizations cannot use cloud-based transcription (which often costs ~$150/user/month) due to strict data compliance laws regarding sensitive meetings.

Local Processing: By keeping the file and transcription 100% local, Handy solves major legal hurdles regarding data transfer.

Efficiency: Estimates in a small Swedish municipality suggest transcription (including sensitive meetings) could save around 5000 hours administrative hours annually. Yes, 5000 hours, not 500... for a municipality of around 60,000 citizens.

Technical Details
Implementation: This feature was built with the assistance of AI coding agents but has been manually reviewed and tested.

Testing: Validated on a standard work laptop using the Parakeet V3 model. Successfully transcribed 20 and 40-minute files without memory spikes or crashes.

I am happy to make changes or discuss how this might be merged or combined with existing efforts!


Summary

Adds the ability to import existing audio files (MP3, M4A, WAV) into Handy's history with automatic transcription. This feature includes real-time progress tracking, system notifications, and robust handling of long audio files using smart chunking (VAD).

Successfully transcribed files are automatically moved to the "Recordings" folder for organization.

Motivation
Users requested the ability to transcribe existing audio files, not just live recordings. This feature enables batch processing of pre-recorded audio while maintaining the same quality and privacy guarantees as live transcription.

Changes
Core Features
File Import: Native file picker for selecting MP3, M4A, and WAV files.
Smart Chunking: Uses Voice Activity Detection (VAD) to split audio on silence, preventing words from being cut in half and improving transcription quality.
Progress Tracking: Real-time progress bar showing transcription progress (0-100%).
System Notifications: Desktop notifications on completion or failure.
Long File Support: Prevents crashes on files >20 minutes by processing in chunks.
Auto-Model Loading: Automatically loads the transcription model if it's not currently loaded.
File Management: Moves imported files to the
recordings
folder after successful transcription.
Technical Implementation
Backend (
src-tauri/
)

Audio Processing: Added
decode_and_resample()
using symphonia for multi-format support.
Smart Chunking: Integrated
SileroVad
to detect silence windows around the target chunk size (30s) in
TranscriptionManager
.
Import Workflow: New import_audio_file command handles decoding, resampling, smart chunking, transcription, and file management.
Events: Emits import-status and transcription-progress events for UI updates.
Database: Updated HistoryManager to store audio duration.

Frontend (src/)
UI: Added "Import Audio File" button to History settings.
Feedback: Implemented a progress bar component with percentage display.
Integration: Added event listeners for status updates and system notifications.

Dependencies
Backend: symphonia, tauri-plugin-notification, vad-rs
Frontend: @tauri-apps/plugin-notification

Testing
Tested on Windows laptop with:

✅ Short files (<1 min)
✅ Medium files (5-10 min)
✅ Long files (20+ min) - previously crashed, now works
✅ All supported formats (MP3, M4A, WAV)
✅ Model auto-loading and recovery if unloaded during transcription of long files
✅ Progress bar
✅ Notifications (success/failure)

Screenshots

image Skärmbild 2025-11-25 181150 image

@Signal46
Copy link
Author

Relates to #299

@olejsc
Copy link
Contributor

olejsc commented Nov 25, 2025

I notice this implements the actual storage of the file. I think supporting both scenarios (link to file path + actually copy the file to the recordings folder) would be nice, but maybe as a option the user can choose to copy or link "per file" chosen to transcribe (or, as a general option) ?

Its also nice with the support for large files - I didn't really try any long transcriptions in the other PR.

How does the 30 second batch work if it "clips" sentences/words in the middle?

On another note, I like how similar we've been thinking about where the "upload file" should be 😄

@Signal46
Copy link
Author

Great question @olejsc ! I hadnt thought about that.
I implemented smart chunking using VAD which will look for silence around the 30 second mark now and make the cut. It looks like the transcription was of a lot higher quality now after the change, I assumed it was simply because I was transcribing in Swedish before that the quality was low.

Regarding the file location, I was considering a future feature of automatically deleting these files because of privacy / compliance reasons but I'm not sure what is the best way of doing it.

Regarding the UI alignment: yes that quite suprising! I was thinking it would be the place with the least amount of changes to the original code.

@cjpais
Copy link
Owner

cjpais commented Nov 26, 2025

@Signal46 im going to review this today or tomorrow, but if you don't mind, I would prefer transcribe-rs having the chunking done there if possible. Would streamline the code and be useful for more people that way I think

Happy to review a PR there for this too

- Removed local VAD implementation and manual chunking loop from [TranscriptionManager](cci:2://file:///c:/Handy_MeetingTranscription/src-tauri/src/managers/transcription.rs:35:0-45:1).
- Updated [transcribe()](cci:1://file:///c:/Handy_MeetingTranscription/src-tauri/src/managers/transcription.rs:304:4-440:5) to call the new `transcribe_with_smart_chunking` API from the `transcribe-rs` library.
- Updated [Cargo.toml](cci:7://file:///c:/Handy_MeetingTranscription/src-tauri/Cargo.toml:0:0-0:0) to point to the local `transcribe-rs` dependency for development.

This change streamlines the application code by moving the complex silence-detection logic into the underlying transcription library, as requested.
Restore progress callback in transcribe_with_smart_chunking calls (enables progress bar updates)
Fix import workflow to save decoded audio as new WAV file (prevents "file not found" errors)
Make source file deletion non-fatal
@Signal46
Copy link
Author

@cjpais thanks! I finished moving the chunking to transcribe-rs and made a pull request for that as well, thanks for your feedback. This is tested again and ready for review now.

@cjpais
Copy link
Owner

cjpais commented Nov 28, 2025

Thank you so much for doing that. This is definitely gonna take me longer than I expected to review just because it's a major enough feature. And I think overall this PR looks good to me, but I do need to take some time and spend some time with it before I fully pull it in. I'm currently traveling and not in a consistent place, so when I do get to review this, I want to just spend a few solid hours with it. Uh and I expect that it'll take me at least a few solid hours to review uh before pulling in. So just give me a couple weeks, I think. But let's get this in. I just wanna give an update that I am thinking about it.

@Signal46 if you don't mind, maybe a second PR for transcribe-rs... hahah... to move the decode and resample bits there as well, so it natively can support many more file types? I think it would also simplify the code here which would be nice. This one is a bit smaller of a change to the library because I believe it already supports transcribing files directly, so all of the decoding and resampling could happen transparently I think

I know @olejsc you have been working on #371 as well which is quite similar, but I think maybe I slightly prefer the UI here

@Signal46
Copy link
Author

@cjpais Yeah that sounds good, no need to rush! 👍

I will work on the moving the decode and resample bits to transcribe-rs as well but i think it will take a few days, maybe around tuesday.

@olejsc
Copy link
Contributor

olejsc commented Nov 28, 2025

one thing that was in PR #371 was a distinct icon for each history for if a entry was a recording (by the user) or a uploaded file. Is this relevant here ? I personally liked to have that distinction, but must admit it doesn't really provide any huge value. @cjpais
It used a microphone icon (🎙️) for a recording entry, and some document icon (📃) for the uploaded file entries.

- Update audio import to use `transcribe-rs`'s new decoding logic
- Add `source` field to history entries to track "recording" vs "upload"
- Add database migration for `source` column
- Update history UI to display 🎙️ for recordings and 📃 for uploads
@Signal46
Copy link
Author

Signal46 commented Dec 1, 2025

I moved the decode and resampling to transcribe-rs and committed to the existing pull request cjpais/transcribe-rs#14

Also added a microphone icon and document icon for the uploaded file entries in Handy.

@genesis-gh-ggarrett
Copy link

this would be awesome; also to be able to record within the app itself and start / pause / stop a recording

@JasonAppo
Copy link

Hi Signal46,
I think this would be awesome and I have been waiting for a way to process long audio files for a similar usage that require privacy (medical interviews). I tried to build your version on my pc but it failed, could you please release an exe or tell me how to process so that I can try your version ? Thank you for your time !

@Signal46
Copy link
Author

@JasonAppo whats the error that you get when building it? Have you checked the instructions in BUILD.md?

@cjpais Just wanted to follow up on this since it's been about 6 weeks, Seems like theres still active interest in this feature.

@JasonAppo
Copy link

Hi and thank you for your reply, here is the error i got at the end of Building

error[E0432]: unresolved import transcribe_rs::audio::decode_and_resample
--> src\commands\import.rs:10:5
|
10 | use transcribe_rs::audio::decode_and_resample;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no decode_and_resample in audio

warning: unused import: symphonia::core::sample::Sample
--> src\audio_toolkit\audio\utils.rs:15:5
|
15 | use symphonia::core::sample::Sample;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: #[warn(unused_imports)] (part of #[warn(unused)]) on by default

warning: unused import: error
--> src\commands\import.rs:5:18
|
5 | use log::{debug, error, info};
| ^^^^^

warning: unused import: crate::actions::ACTION_MAP
--> src\signal_handle.rs:1:5
|
1 | use crate::actions::ACTION_MAP;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: unused import: crate::ManagedToggleState
--> src\signal_handle.rs:2:5
|
2 | use crate::ManagedToggleState;
| ^^^^^^^^^^^^^^^^^^^^^^^^^

warning: unused imports: debug, info, and warn
--> src\signal_handle.rs:3:11
|
3 | use log::{debug, info, warn};
| ^^^^^ ^^^^ ^^^^

warning: unused import: std::thread
--> src\signal_handle.rs:4:5
|
4 | use std::thread;
| ^^^^^^^^^^^

warning: unused imports: AppHandle and Manager
--> src\signal_handle.rs:5:13
|
5 | use tauri::{AppHandle, Manager};
| ^^^^^^^^^ ^^^^^^^

error[E0599]: no method named transcribe_with_smart_chunking found for mutable reference &mut WhisperEngine in the current scope
--> src\managers\transcription.rs:392:26
|
391 | / whisper_engine
392 | | .transcribe_with_smart_chunking(
| | -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ method not found in &mut WhisperEngine
| |_________________________|
|

error[E0599]: no method named transcribe_with_smart_chunking found for mutable reference &mut ParakeetEngine in the current scope
--> src\managers\transcription.rs:407:26
|
406 | / parakeet_engine
407 | | .transcribe_with_smart_chunking(
| | -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ method not found in &mut ParakeetEngine
| |_________________________|
|
Some errors have detailed explanations: E0432, E0599.
For more information about an error, try rustc --explain E0432.
warning: handy (lib) generated 7 warnings
error: could not compile handy (lib) due to 3 previous errors; 7 warnings emitted

@cjpais
Copy link
Owner

cjpais commented Jan 13, 2026

@Signal46 this is on my list for 1.0.0 but I am very very busy right now. I really would like to get this in soon, but I know it's going to take some time.

My biggest priority for the moment is a rewrite of the keyboard + adding parakeet streaming ideally. After that this will be high on my list as well as the text replacements.

Thanks for the patience

@Signal46
Copy link
Author

@JasonAppo I created a release for you: https://github.com/Signal46/Handy_MeetingTranscription/releases/tag/Meeting-Transcription
Hope that works for you!

@cjpais No worries, I'm very glad to hear its still something that could be valuable!

@jorgedanisc
Copy link
Contributor

I was testing the fork from @Signal46 on my macOS (looks good overall) and also recognized these UX problems from my experience with the app (from someone who keeps a recording long history without deleting)

  • The History list should be virtualized, because for some people (my experience) it can come time when this list is so huge that the app gets really slow when going to the History tab

From the experience of using the meeting-transcription fork:

  • For items with really long text on the history list, it should have a line limit where it collapses - possibly adding a "show more" button to uncollapse
  • The progress bar didn't work for me, on macos
  • It would be cool to also make the app's tray icon to have the progress of the transcription
  • What happens if the user starts a transcription while an uploaded file that is actively transcribing? I can see this as a common scenario to be problematic because usually uploaded file can be long recordings, therefore in the meantime the user could need to use the usual shortcut transcribe action - and in this moment it would collide, so perhaps having proper parallelization here could solve it

@JasonAppo
Copy link

Thank a lot to everyone involved in this project and thanks Signal46 for taking this time, I tried your version and could transcribe a 2 hours long session in 5 min, this is so usefull !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants