-
-
Notifications
You must be signed in to change notification settings - Fork 890
Added feature of transcription of local files (WAV, MP3 and M4A) along with progressbar #381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ling, and WAV file saving utilities.
…transcription, and history management.
…Tauri backend commands and UI components.
… and progress bar for transcription of local files.
|
Relates to #299 |
|
I notice this implements the actual storage of the file. I think supporting both scenarios (link to file path + actually copy the file to the recordings folder) would be nice, but maybe as a option the user can choose to copy or link "per file" chosen to transcribe (or, as a general option) ? Its also nice with the support for large files - I didn't really try any long transcriptions in the other PR. How does the 30 second batch work if it "clips" sentences/words in the middle? On another note, I like how similar we've been thinking about where the "upload file" should be 😄 |
|
Great question @olejsc ! I hadnt thought about that. Regarding the file location, I was considering a future feature of automatically deleting these files because of privacy / compliance reasons but I'm not sure what is the best way of doing it. Regarding the UI alignment: yes that quite suprising! I was thinking it would be the place with the least amount of changes to the original code. |
|
@Signal46 im going to review this today or tomorrow, but if you don't mind, I would prefer transcribe-rs having the chunking done there if possible. Would streamline the code and be useful for more people that way I think Happy to review a PR there for this too |
- Removed local VAD implementation and manual chunking loop from [TranscriptionManager](cci:2://file:///c:/Handy_MeetingTranscription/src-tauri/src/managers/transcription.rs:35:0-45:1). - Updated [transcribe()](cci:1://file:///c:/Handy_MeetingTranscription/src-tauri/src/managers/transcription.rs:304:4-440:5) to call the new `transcribe_with_smart_chunking` API from the `transcribe-rs` library. - Updated [Cargo.toml](cci:7://file:///c:/Handy_MeetingTranscription/src-tauri/Cargo.toml:0:0-0:0) to point to the local `transcribe-rs` dependency for development. This change streamlines the application code by moving the complex silence-detection logic into the underlying transcription library, as requested.
Restore progress callback in transcribe_with_smart_chunking calls (enables progress bar updates) Fix import workflow to save decoded audio as new WAV file (prevents "file not found" errors) Make source file deletion non-fatal
|
@cjpais thanks! I finished moving the chunking to transcribe-rs and made a pull request for that as well, thanks for your feedback. This is tested again and ready for review now. |
|
Thank you so much for doing that. This is definitely gonna take me longer than I expected to review just because it's a major enough feature. And I think overall this PR looks good to me, but I do need to take some time and spend some time with it before I fully pull it in. I'm currently traveling and not in a consistent place, so when I do get to review this, I want to just spend a few solid hours with it. Uh and I expect that it'll take me at least a few solid hours to review uh before pulling in. So just give me a couple weeks, I think. But let's get this in. I just wanna give an update that I am thinking about it. @Signal46 if you don't mind, maybe a second PR for transcribe-rs... hahah... to move the decode and resample bits there as well, so it natively can support many more file types? I think it would also simplify the code here which would be nice. This one is a bit smaller of a change to the library because I believe it already supports transcribing files directly, so all of the decoding and resampling could happen transparently I think I know @olejsc you have been working on #371 as well which is quite similar, but I think maybe I slightly prefer the UI here |
|
@cjpais Yeah that sounds good, no need to rush! 👍 I will work on the moving the decode and resample bits to transcribe-rs as well but i think it will take a few days, maybe around tuesday. |
|
one thing that was in PR #371 was a distinct icon for each history for if a entry was a recording (by the user) or a uploaded file. Is this relevant here ? I personally liked to have that distinction, but must admit it doesn't really provide any huge value. @cjpais |
- Update audio import to use `transcribe-rs`'s new decoding logic - Add `source` field to history entries to track "recording" vs "upload" - Add database migration for `source` column - Update history UI to display 🎙️ for recordings and 📃 for uploads
|
I moved the decode and resampling to transcribe-rs and committed to the existing pull request cjpais/transcribe-rs#14 Also added a microphone icon and document icon for the uploaded file entries in Handy. |
|
this would be awesome; also to be able to record within the app itself and start / pause / stop a recording |
|
Hi Signal46, |
|
@JasonAppo whats the error that you get when building it? Have you checked the instructions in BUILD.md? @cjpais Just wanted to follow up on this since it's been about 6 weeks, Seems like theres still active interest in this feature. |
|
Hi and thank you for your reply, here is the error i got at the end of Building
|
|
@Signal46 this is on my list for 1.0.0 but I am very very busy right now. I really would like to get this in soon, but I know it's going to take some time. My biggest priority for the moment is a rewrite of the keyboard + adding parakeet streaming ideally. After that this will be high on my list as well as the text replacements. Thanks for the patience |
|
@JasonAppo I created a release for you: https://github.com/Signal46/Handy_MeetingTranscription/releases/tag/Meeting-Transcription @cjpais No worries, I'm very glad to hear its still something that could be valuable! |
|
I was testing the fork from @Signal46 on my macOS (looks good overall) and also recognized these UX problems from my experience with the app (from someone who keeps a recording long history without deleting)
From the experience of using the meeting-transcription fork:
|
|
Thank a lot to everyone involved in this project and thanks Signal46 for taking this time, I tried your version and could transcribe a 2 hours long session in 5 min, this is so usefull ! |
Hello! This is my first contribution to an open-source project on GitHub. I’m a big fan of Handy and saw a need for a feature to handle long-form recordings safely.
I realized after working on this that there is an existing PR (#371) for file uploads, but I thought I would make this contribution anyway since this has a progress bar.
Motivation & Context
There is a significant need for this feature in the public sector, specifically for municipalities and government agencies (e.g., here in Sweden).
Compliance & GDPR: Many organizations cannot use cloud-based transcription (which often costs ~$150/user/month) due to strict data compliance laws regarding sensitive meetings.
Local Processing: By keeping the file and transcription 100% local, Handy solves major legal hurdles regarding data transfer.
Efficiency: Estimates in a small Swedish municipality suggest transcription (including sensitive meetings) could save around 5000 hours administrative hours annually. Yes, 5000 hours, not 500... for a municipality of around 60,000 citizens.
Technical Details
Implementation: This feature was built with the assistance of AI coding agents but has been manually reviewed and tested.
Testing: Validated on a standard work laptop using the Parakeet V3 model. Successfully transcribed 20 and 40-minute files without memory spikes or crashes.
I am happy to make changes or discuss how this might be merged or combined with existing efforts!
Summary
Adds the ability to import existing audio files (MP3, M4A, WAV) into Handy's history with automatic transcription. This feature includes real-time progress tracking, system notifications, and robust handling of long audio files using smart chunking (VAD).
Successfully transcribed files are automatically moved to the "Recordings" folder for organization.
Motivation
Users requested the ability to transcribe existing audio files, not just live recordings. This feature enables batch processing of pre-recorded audio while maintaining the same quality and privacy guarantees as live transcription.
Changes
Core Features
File Import: Native file picker for selecting MP3, M4A, and WAV files.
Smart Chunking: Uses Voice Activity Detection (VAD) to split audio on silence, preventing words from being cut in half and improving transcription quality.
Progress Tracking: Real-time progress bar showing transcription progress (0-100%).
System Notifications: Desktop notifications on completion or failure.
Long File Support: Prevents crashes on files >20 minutes by processing in chunks.
Auto-Model Loading: Automatically loads the transcription model if it's not currently loaded.
File Management: Moves imported files to the
recordings
folder after successful transcription.
Technical Implementation
Backend (
src-tauri/
)
Audio Processing: Added
decode_and_resample()
using symphonia for multi-format support.
Smart Chunking: Integrated
SileroVad
to detect silence windows around the target chunk size (30s) in
TranscriptionManager
.
Import Workflow: New import_audio_file command handles decoding, resampling, smart chunking, transcription, and file management.
Events: Emits import-status and transcription-progress events for UI updates.
Database: Updated HistoryManager to store audio duration.
Frontend (src/)
UI: Added "Import Audio File" button to History settings.
Feedback: Implemented a progress bar component with percentage display.
Integration: Added event listeners for status updates and system notifications.
Dependencies
Backend: symphonia, tauri-plugin-notification, vad-rs
Frontend: @tauri-apps/plugin-notification
Testing
Tested on Windows laptop with:
✅ Short files (<1 min)
✅ Medium files (5-10 min)
✅ Long files (20+ min) - previously crashed, now works
✅ All supported formats (MP3, M4A, WAV)
✅ Model auto-loading and recovery if unloaded during transcription of long files
✅ Progress bar
✅ Notifications (success/failure)
Screenshots