Enhancement: The ability to configure speech-to-text within the front end. #5253

danielrosehill · 2025-01-10T17:17:12Z

danielrosehill
Jan 10, 2025

What features would you like to see added?

As a huge user of speech-to-text services who finds them invaluable for capturing prompts, I would very much like to be able to configure speech-to-text within the app itself rather than having to do so via the YAML configuration files.

I tried that approach and I can't seem to get it to hold my details. But either way, I think it's much more sustainable to have this functionality accessible through the UI.

More details

I've been using these tools (speech to text) almost full-time for the past few months so if my personal assessment of their capabilities is of any use I'll offer it here:

Locally hosted Whisper models are less helpful in my opinion than simply using Whisper via the OpenAI API in this particular context. What I mean by that is for the majority of users who aren't deploying their instance on to hardware that would really do STT justice (high spec GPU etc) they're better served by using a cloud API and the costs associated with prompting via Whisper in my experience are not that significant. I think that offering users all the options is absolutely the right approach, but I'd be happy to draft some documentation on working with all options if I can get it to work on my own instance.
Beyond Whisper, there are some other speech-to-text providers that are accessible via API and which it might be nice to offer as additional options as relatively few tools are doing this to date. I'd point to Amazon, DeepGram, and Speechmatics and other additional providers who are offering high quality ASR voice recognition tools that generally far exceed the performance of Google or whatever else the user has accessible in the browser.

Which components are impacted by your request?

General

Pictures

No response

Code of Conduct

I agree to follow this project's Code of Conduct

danny-avila · 2025-01-10T17:19:42Z

danny-avila
Jan 10, 2025
Maintainer

@berry-13 I agree that we should add user_provided functionality to STT

1 reply

berry-13 Jan 10, 2025
Collaborator

yes, it's planned

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enhancement: The ability to configure speech-to-text within the front end. #5253

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Enhancement: The ability to configure speech-to-text within the front end. #5253

Uh oh!

Uh oh!

danielrosehill Jan 10, 2025

What features would you like to see added?

More details

Which components are impacted by your request?

Pictures

Code of Conduct

Replies: 1 comment · 1 reply

Uh oh!

danny-avila Jan 10, 2025 Maintainer

Uh oh!

berry-13 Jan 10, 2025 Collaborator

danielrosehill
Jan 10, 2025

Replies: 1 comment 1 reply

danny-avila
Jan 10, 2025
Maintainer

berry-13 Jan 10, 2025
Collaborator