Skip to content

Conversation

@KrE80r
Copy link

@KrE80r KrE80r commented Dec 24, 2025

Before Submitting This PR

Please confirm you have done the following:

If this is a feature or change that was previously closed/rejected:

  • I have explained in the description below why this should be reconsidered
  • I have gathered community feedback (link to discussion below)

Human Written Description

I use another STT with which auto-stops recording after silence. I wanted the same capability in Handy for hands-free operation without needing to press the shortcut again to stop recording.

Related Issues/Discussions

This feature addresses the need for hands-free dictation workflows where users want recording to automatically stop after they finish speaking, without requiring manual shortcut activation to end the recording.

Community Feedback

New feature - no prior discussion. This implements a commonly requested pattern for voice dictation apps.

Testing

  • Tested on Fedora Linux with the built RPM package
  • Cross-platform compatible: uses existing VAD pipeline with no platform-specific code
  • Settings persist correctly across app restarts
  • Timeout options: Disabled, 2s, 3s, 5s, 10s of silence after speech detected

Screenshots/Videos (if applicable)

Settings UI added under Advanced section:
image

Implementation Details

Changes:

  1. Backend (Rust):

    • Added AutoStopSilenceTimeout enum to settings
    • Extended AudioRecorder with silence frame tracking
    • Added callback mechanism to trigger transcription stop
    • Uses existing VAD (Voice Activity Detection) to detect silence
  2. Frontend (React/TypeScript):

    • New AutoStopSilenceTimeoutSetting component with dropdown
    • Added i18n translations for setting labels
    • Integrated with settings store

How it works:

  • After speech is first detected, the system starts counting consecutive silence frames (30ms each)
  • When silence duration exceeds the configured timeout, it triggers the same stop action as pressing the shortcut key
  • Only activates AFTER speech is detected, preventing premature stops

Automatically stops recording after a configurable period of silence
following speech detection. Similar to nerd-dictation's --timeout feature.

Changes:
- Add AutoStopSilenceTimeout enum with options: disabled, 2s, 3s, 5s, 10s
- Track consecutive silence frames in VAD processing pipeline
- Trigger transcription stop when silence threshold exceeded
- Add settings UI component in Advanced settings
- Wire up frontend settings store and translations

The feature only triggers after speech has been detected at least once,
preventing premature stops during initial silence.
@cjpais
Copy link
Owner

cjpais commented Dec 24, 2025

Mostly writing here to say I have seen this. I've thought about it and I don't have a strong opinion at the moment. The main concern I have is just generally adding more features and making the app more confusing. I do understand the reasoning behind this. The screenshot definitely didn't upload properly, just for reference. I won't be able to test this for a little bit. I'm working on some major features and won't be reviewing pull requests for a little bit of time while I work on that. However, one thing maybe which might make sense is utilizing the existing "Push to Talk" setting and instead of that being a boolean value, maybe it makes more sense to for that to be a string value or an enum value where it shows a drop down menu of the different kinds of effectively shortcut triggers you want or something like that because to me this one also seems like it only really works with pressing it and then having to press the key binding again. So not push to talk mode and yeah, it's just a bit unclear what the user experience like this is overall to make sure it's like consistent and works properly. Have you tested this during push to talk and does it really even make sense to have it there?

@KrE80r
Copy link
Author

KrE80r commented Dec 24, 2025

Added the screenshot.

I did test with and without push-to-talk , and I still opted for this, it might just be my muscle memory since I was using another software that did this auto-disengage, so I thought of adding it here.

Currently running this on a linux machine, seems working fine so far.

@joshribakoff
Copy link

joshribakoff commented Jan 5, 2026

It would be good to land tests first, before we add more complexity. If we have auto-stop, its only natural to also include auto-start. We could also consider adding trigger words "start dictation", "stop dictation". I am generally in favor of building this direction out, as a user and new contributor. I just hope we can keep the UI simple and stable, with the above suggestions + tests.

@joshribakoff
Copy link

joshribakoff commented Jan 6, 2026

Sorry to double post but I think it’s worth noting: on the dictation, that is built into Mac OS by default, it also auto stops, and as far as I’m aware there’s no setting to disable it.

most driver assistance, also forcefully disengages out of safety and conservative trade-offs, if user presence is no longer detected.

I’m actually convinced we could globally enable this. The main blocker(s) to doing so here could potentially be:

  • automated testing gaps
  • on Mac it plays a little beep when you start and stop dictation. on Tesla self driving you can also not disable the little beep when you enter exit self driving mode. This feels like a dependency.

There are potentially real privacy and usability concerns with accidentally leaving dictation on, and then accidentally triggering input into a different window

But I do think there are arguments for having user settings here. There’s no right or wrong answer. But for what it’s worth it looks like what Apple did is they made a conservative decision to auto stop and output the text after a many seconds of silence.

I don’t think the current user interface of handy is 100% optimal, but I do agree with the sentiment that we should be moving it forward and a way where we don’t increase the level of entropy in the settings

@cjpais
Copy link
Owner

cjpais commented Jan 7, 2026

@joshribakoff please move comments/opinions to discussions. I don't think these comments are helpful in this PR and it clogs up something which is active development work. I'm not really in favor of expanding the scope of this PR as there are already some concerns I mentioned earlier. This PR will likely make it in at some point but it's going to take some time. I am quite busy, and there's a lot to maintain and address.

You're largely giving an opinion here which is a discussion which needs community support. I don't intend to change core and default application behavior without broad support from the community, or from myself directly. I value feedback and opinion. I like good defaults, but I personally don't know if what you are describing is a good default for my own usage of the application. Go collect support based on your belief/opinion.

It would be good to land tests first, before we add more complexity.

Regarding this... I will land whatever I have time for in the order I feel like. If I see a PR with adding tests I certainly will consider ordering, but if there's no PR for it, I am not going to block someone else's work for something that doesn't exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants