Integration with VoiceToText #3577
-
Will there be any native support for using models like Whisper in SK soon? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
@rnortje-sygnia - this is something we are looking at. More to come on our roadmap for 2024 in the coming weeks. |
Beta Was this translation helpful? Give feedback.
-
@alliscode - FYI |
Beta Was this translation helpful? Give feedback.
-
Question: here are all the problems with this feature: (voice access in windows) towards the api, i havent tried the new one, it might be there: i find a mishmash of accessibility features al over the place. not i use use a cardoid dynamic mic for voicals and a preamp. Naturally filtering out high frequency noice makes a huge differece. Having the mic wear able and or earpiece and or jaw motion detector might help. trying to speak across that room at a mic array is not going to work. Just so that the infrastructure will be accommodated I hope if its implemented in the best way. I probably already is , it seems in SK.. If offline Voice Access can or wont be accessible , 3rd party DSP voice to text and agent chains, as Pico isn't useable in my last test. but can be. It needs to retain and or voice to text voice memory, and make refining passes. The voice to text could take context first to speed it up and increase accuracy.. That would be I guess at the prompting level. I want to present changes proposals, to the Windows Insider and accessibility groups and Developer Studio teams. Most of the user feedback is converging on what i hope to see and its probably less work considering support and feedback and documentation which should not be required. to not require instructions, to use the RAG or the Phi or the vectors, and SK agents chains to be made into a fully usable first class feature. We've been discussing it in the user feedback. I'lll throw out some implementation vision Ive had over they years that can save billions of user and developer hours per day, duplicating functionality. i find this but there is a lot of overlap with the new and features and the windows copilot which i find to be telling me what to do, sappying productivity and one of the least appreciate features in windows in sider as it stands unign Bing, like a help system , or instruction manual when apps should be natural and intuitive when Ai is the in the mix. so i can't tel the direction , i guess people can choose what suits them. Note: in the chatCPT store i tried using Whisper as an its far from usable.
I don't think the Windows Copilot has been of use and its already has a big overlap with voice access. The Code Feature search in Developer Studio made me realize that features importance and generalizability. OCS and multimodal discovery of visual gui and tooltips can be generalized and added to legacy applications without securities concerns regardless to product designer, user: To make voice and screen readers, make it a first class feature, combined with UI discovery and context awareness. Agent chaining is needed. people can take it or leave it , and use some or all, or depending on level of disability train it with an assistant then go hands free /blind/ and or deaf, but not all three, or via special devices. I've tried some whisper based voice to text and they are not usable as they are over the wire. But it should allow like Bard says it can do ( but doesn't) a way to use context to periodically fix the voice to text under dictation, and when volume is raised or a command label is said that is after a pause and clearly our of context, or Matching Send it, or Escape, or Got to Start, or Switch to application X.. I almost think the LLMs can code the agent chains themselves that are appropriate for a generalized UI that is multimodal with optional UI find / invoke, Object model query, Code search, Command label with A, B, C or Alpha Bravo Charile etc, and make its own guess if the user is dictating , spelling , or commanding. im picturing a UI something like this.. |
Beta Was this translation helpful? Give feedback.
@rnortje-sygnia - this is something we are looking at. More to come on our roadmap for 2024 in the coming weeks.