Integration with VoiceToText #3577

Question:
is there a way to access the low latency voice to text under (Voice Access for windows accessibly via api? I think its mabye too good at fast dictation to be open source, but needs its output directed only to text boxes, and for Sk to accomodate this an other I hope it can be generalized and optimal

here are all the problems with this feature: (voice access in windows)
https://www.pcworld.com/article/1981483/using-windows-voice-access-and-dictation.html

towards the api, i havent tried the new one, it might be there:

i find a mishmash of accessibility features al over the place.
https://learn.microsoft.com/en-us/samples/microsoft/windows-universal-samples/speechrecognitionandsynthesis/ This might let me access the feature but i don't want to rewrite voice access, or customize anything I think it would be and add on that ships with windows made by a small team of experts that know the history and have worked with disabled users nd with ergonomics and multiplatform development tooling, like Netcore and Avalonia, and familiar with driver level code this SDK, DSP , in Linux and OSX and iOS who want to do this their own way but i see too much of it is generalizable and requires a ton of duplicate effort and endless maintenance, and that that isnt necessary given OCR and this Agent change framework and the new LLMs and SLMs,
Also the ARM DSP and programmability might make external devices let necessary.

not i use use a cardoid dynamic mic for voicals and a preamp. Naturally filtering out high frequency noice makes a huge differece. Having the mic wear able and or earpiece and or jaw motion detector might help. trying to speak across that room at a mic array is not going to work.

Just so that the infrastructure will be accommodated I hope if its implemented in the best way. I probably already is , it seems in SK..

If offline Voice Access can or wont be accessible , 3rd party DSP voice to text and agent chains, as Pico isn't useable in my last test. but can be.

It needs to retain and or voice to text voice memory, and make refining passes. The voice to text could take context first to speed it up and increase accuracy.. That would be I guess at the prompting level.

I want to present changes proposals, to the Windows Insider and accessibility groups and Developer Studio teams. Most of the user feedback is converging on what i hope to see and its probably less work considering support and feedback and documentation which should not be required.

to not require instructions, to use the RAG or the Phi or the vectors, and SK agents chains to be made into a fully usable first class feature. We've been discussing it in the user feedback.

I'lll throw out some implementation vision Ive had over they years that can save billions of user and developer hours per day, duplicating functionality.

i find this but there is a lot of overlap with the new and features and the windows copilot which i find to be telling me what to do, sappying productivity and one of the least appreciate features in windows in sider as it stands unign Bing, like a help system , or instruction manual when apps should be natural and intuitive when Ai is the in the mix. so i can't tel the direction , i guess people can choose what suits them.

https://learn.microsoft.com/en-us/samples/microsoft/windows-universal-samples/speechrecognitionandsynthesis/

Note: in the chatCPT store i tried using Whisper as an its far from usable.

This is is going off topic and holistic but skip it or ill erase it if its wrong, i'm not an OSS expert on what appropriate, but have worked on 40 years old code base with tons of deeply nest UI, CAD started in NLP and and UI , design, animation, custom tooling and physics and and 9 years in multiplatform development with 3d and 2d and wpf like framework, and DSP experience and might not be dreaming of something that's not feasible

I don't think the Windows Copilot has been of use and its already has a big overlap with voice access.

The Code Feature search in Developer Studio made me realize that features importance and generalizability.

OCS and multimodal discovery of visual gui and tooltips can be generalized and added to legacy applications without securities concerns

regardless to product designer, user:

To make voice and screen readers, make it a first class feature, combined with UI discovery and context awareness. Agent chaining is needed.

people can take it or leave it , and use some or all, or depending on level of disability train it with an assistant then go hands free /blind/ and or deaf, but not all three, or via special devices.

I've tried some whisper based voice to text and they are not usable as they are over the wire. But it should allow like Bard says it can do ( but doesn't) a way to use context to periodically fix the voice to text under dictation, and when volume is raised or a command label is said that is after a pause and clearly our of context, or Matching Send it, or Escape, or Got to Start, or Switch to application X.. I almost think the LLMs can code the agent chains themselves that are appropriate for a generalized UI that is multimodal with optional UI find / invoke, Object model query, Code search, Command label with A, B, C or Alpha Bravo Charile etc, and make its own guess if the user is dictating , spelling , or commanding.

im picturing a UI something like this..

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integration with VoiceToText #3577

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Integration with VoiceToText #3577

Uh oh!

rnortje-sygnia Nov 21, 2023

Replies: 3 comments · 1 reply

Uh oh!

evchaki Jan 3, 2024 Collaborator

Uh oh!

joslat Jan 25, 2024

Uh oh!

evchaki Jan 25, 2024 Collaborator

Uh oh!

damian-666 Feb 2, 2024

rnortje-sygnia
Nov 21, 2023

Replies: 3 comments 1 reply

evchaki
Jan 3, 2024
Collaborator

evchaki
Jan 25, 2024
Collaborator

damian-666
Feb 2, 2024