Skip to content

Conversation

@Yorick-Ryu
Copy link

@Yorick-Ryu Yorick-Ryu commented Jan 2, 2026

Before Submitting This PR

Please confirm you have done the following:

If this is a feature or change that was previously closed/rejected:

  • I have explained in the description below why this should be reconsidered
  • I have gathered community feedback (link to discussion below)

Human Written Description

I implemented a local STT API that follows the OpenAI Whisper format. Currently, the Whisper model is only accessible within Handy; however, many users want to leverage this functionality for external tasks like subtitle transcription without loading multiple model instances. This change exposes the speech-to-text capability as a standardized service, allowing users to do more with limited system memory.

Related Issues/Discussions

Fixes # None
Discussion: #241

Community Feedback

#241

Testing

Environment:

  • Tested on: macOS 26.2 (Apple Silicon M1 Pro)
  • Status: Functional on macOS. Need help testing on Windows and Linux platforms to ensure consistent behavior.

Test Cases:

  • Features: Tested by calling the API using curl and Demo: convert MP3 to SRT
  • On-demand Loading: Verified via curl that calling the /v1/audio/transcriptions endpoint correctly triggers the model loading process in the background.
  • Waiting Mechanism: Confirmed the API response waits until the model is fully loaded before processing the transcription, preventing "Model not loaded" errors.
  • Verified Limitations: Tested various audio formats and confirmed only MP3 currently works reliably; documented this behavior and added a "welcome PRs" note in LOCAL_API.md to guide future contributors.

Screenshots/Videos (if applicable)

image2 image1

@Yorick-Ryu
Copy link
Author

@cjpais Hello, is there any issue with this PR?

@cjpais
Copy link
Owner

cjpais commented Jan 5, 2026

@Yorick-Ryu please be patient, I have not had time to review it yet

Im currently traveling and haven't been able to look at my laptop much. Handy was featured in wired which has brought in a lot of new issues and discussions I respond to every day

@Yorick-Ryu
Copy link
Author

I saw this: https://www.wired.com/story/handy-free-speech-to-text-app/
Congratulations!

@samiulazam
Copy link

@cjpais any updates?

@cjpais
Copy link
Owner

cjpais commented Jan 15, 2026

Patience is the key :)

@Aaryan-Kapoor
Copy link

+1

1 similar comment
@FSerg
Copy link

FSerg commented Jan 27, 2026

+1

@Yorick-Ryu
Copy link
Author

@cjpais Conflicts are fixed. Is it a good time to merge?

@cjpais
Copy link
Owner

cjpais commented Jan 28, 2026

@Yorick-Ryu Please patience I will merge when I have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants