Screen reader support #12508

jquesada2016 · 2025-01-12T14:36:59Z

jquesada2016
Jan 12, 2025

Hello everyone! I am a legally blind dev. I wanted to propose adding screen reader support to Helix editor.

Currently, the state of accessibility tooling for visually impaired devs, such as myself, are very lacking. VS Code is unfortunately the only editor that is accessible, for the most part, due to it building on web technologies. It still isn't great, allowing only for simple reading-aloud of the current line where the cursor is located.

Editors such as Neovim would be practically impossible to retrofit with screen reader support due to it's inherent lack of UI structure and plugin system design.

I propose adding support for screen readers, or at least to bring up discussion about it since there is work underway to add a plugin system. Creating extensible API's with accessibility in mind is unfortunately not something one can add in after the fact easily.

I have two different ideas for adding screen reader support to helix.

AccessKit
Built-in

Let's talk a little more about each of these approaches.

AccessKit

AccessKit is an accessibility tree implementation that handles talking to the underlying platform accessibility abstraction on your behalf. This is good for UI's that have an inherent UI tree, and have a linear navigation flow, i.e., focus on a single action at a time, and can move from one focusable item to the next and back again. Think of dropdown lists, menu items, and using hjkl for moving in the editor.

Supporting AccessKit would universally enable screen readers to be able to interact with helix at a basic level. The problem here is that helix is a CLI, and I am not too sure on the integration story for AccessKit with a CLI application. Assuming integration in a CLI would work, interactions would be strictly limited to simple interactions, much like VS Code where only the focused item would be read aloud, i.e. the current line, menu item, etc.

Built-in

A built-in screen reader would be an embedded text-to-speech engine that would announce whatever action the user is currently performing. No coupling with a platform accessibility tree, managing focus nodes, etc., just a plain old embedded text-to-speech API.

The implementation could be as simple as:

struct TTSEngine { /* ... */ };

impl TTSEngine {
  pub fn queue(text: &str, priority: UtterancePriority);
}

where all speech is queued at the internal API callsites, such as when moving from one line to the next, or when a selection is made.

Comparisons

The biggest advantage of having a built-in TTS engine is that it allows for non-linear navigation of code. A traditional screen reader has no notion of multiple cursors, or LSP suggestions such as variable type hints, inlay hints, code lenses, not even syntax highlighting. Only reading from top-to-bottom, and from right to left.

The largest downside to embedding a TTS, is that the user won't have access to their predefined native screen reader config, such as voice, speech rate, pitch, etc., unless we tried to detect it and load it on startup.

Honestly, there's much to say on this topic, so please, let me know what you guys think. I'd be happy to go into as much detail as y'all want.

kirawi · 2025-01-12T15:21:59Z

kirawi
Jan 12, 2025
Collaborator

Also discussed in #10222

0 replies

jquesada2016 · 2025-01-12T23:21:39Z

jquesada2016
Jan 12, 2025
Author

Thanks for the link!

According to this comment, for this to work, it really needs to be contractually included in helix core. I understand it's just a bunch of text primitives, but the thing is, we as devs are generally lazy, and we tend to make things that work for us. If we include TTS as part of the plugin API''s, or primitives that are inherently accessible, then this will really open the door for people such as myself to have a real alternative for a tool that we can use to get work done without having to fight so hard to do basic tasks. If we leave it up to specific frontends to handle accessibility, then many plugin authors won't implement accessibility due to there not being an explicit common API surface to do so. This being said, it definitely should not be mandatory for accessible API's to be used, but it should be considered to perhaps design them in such a way as to be opt-out, where not using them is intentional, whereas requiring plugin authors to opt-in will lead to many not knowing of the API's available.

We should also think of the context of users that will be writing plugins for an editor like helix. I don't believe (although I do not know) that helix devs come from a front-end background, where thinking of accessibility is something that has been socially encouraged in the last few years. I would think that many are coming from non-front-end related fields, where accessibility for visually impaired devs is never really in their zeitgeist.

0 replies

kiki-ala · 2025-09-03T09:45:43Z

kiki-ala
Sep 3, 2025

I also have vision disabilities, however I can see. Struggling tonight with reading doc comments, I decided to pursue this by means of spd-say on linux, or in my case speech-dispatcher calling up espeak to stream TTS.

This is what I put in my /helix/config.toml:

[keys.normal]
h = ["move_char_left", ':pipe-to say.sh ${selection}']
J = ["move_visual_line_down", ':pipe-to say.sh ${selection}']
l = ["move_char_right", ':pipe-to say.sh ${selection}']
K = ["move_visual_line_up", ':pipe-to say.sh ${selection}']

j = ["move_line_down", "goto_line_end", "extend_to_line_start", ':pipe-to say-nopunc.sh ${selection}']
k = ["move_line_up", "goto_line_end", "extend_to_line_start", ':pipe-to say-nopunc.sh ${selection}']

w = ["move_next_word_start", ':pipe-to say.sh ${selection}']
W = ["move_next_long_word_start", ':pipe-to say.sh ${selection}']
b = ["move_prev_word_start", ':pipe-to say.sh ${selection}']
B = ["move_prev_long_word_start", ':pipe-to say.sh ${selection}']
e = ["move_next_word_end", ':pipe-to say.sh ${selection}']
E = ["move_next_long_word_end", ':pipe-to say.sh ${selection}']
t = ["find_till_char", ':pipe-to say.sh ${selection}']
T = ["till_prev_char", ':pipe-to say.sh ${selection}']
f = ["find_next_char", ':pipe-to say.sh ${selection}']
F = ["find_prev_char", ':pipe-to say.sh ${selection}']
x = ["extend_line_below", ':pipe-to say-nopunc.sh ${selection}']

"`" = ':pipe-to say.sh ${selection}'
"~" = ':pipe-to say-nopunc.sh ${selection}'

[keys.select]
"~" = ':pipe-to say-nopunc.sh ${selection}'

(not including other keybinds and settings I use)

You can see I am calling two shell scripts. They are below.

say.sh

#!/bin/bash
spd-say -C
spd-say -e -i -40 -m all -p -50 -r 40 -t female3 $1

say-nopunc.sh

#!/bin/bash
spd-say -C
spd-say -e -i -40 -p -50 -r 40 -t female3 $1

The spd-say -C is important to "reset" the audio queues. Similarly you could bind another key to call that command. It works good for me. I navigate mostly normally, but it's now selecting lines with j/k. I use the scripts to set up a voice I like. The difference between them is, one says the punctuation. You can see in my keybindings that this is also up to personal preference.

It's not fully featured, and I'm not sure how to get the file browser and other things speaking. However, for editing inside of a file, I find it ergonomic. It does not speak aloud the characters that I type, but it helps me read and feel the file as I navigate for now.

I benefit from it a lot.

0 replies

kiki-ala · 2025-09-03T19:19:57Z

kiki-ala
Sep 3, 2025

After using and experimenting more, I have more ideas. Manipulating the selections to pipe into spd-say is too naive for me, and I want the keys in helix to remain normal while also sending the lines of text to my speech dispatcher.

While it's responsive with a fast synthesizer like espeak, when I tried this with the latest Piper TTS, it sounded great but everything had a small 250~ish ms lag. So I have this idea for a process that runs like a language server, but instead performs pre-rendering and caching of files and their short and long words, as well as having pre-rendered single-character glyphs. This would be a wrapper of sorts for spd-say which would process the file as you navigate and change it, asynchronously doing accelerated rendering of nearby audio components and clearing the cache for audio components that were not used.

Acting as a multi-threaded server, it would be able to keep up better with realtime interaction. As well, the buttons in helix could be the same. The issues I have to research now are this, I guess. Whether it's worth it to learn to set this up as a language server or if it's fine to pipe buffer names and cursor positions in text files to an external application every, say, normal movement command. The ability to do this is there, so in a sense helix can interact with external applications somewhat well already.

I'm just not sure what makes sense. @jquesada2016 Do you have any input or updates on your explorations of this work? As someone who uses TTS a lot, but has never been able to fully use or learn an "established" system-wide screen-reader, what is your take on a small daemon that runs alongside helix while prerendering audio for nearby words and performing the mixing that might be necessary to represent multiple cursors, menus, OCR or data piped from fzf for the file management, etc?

I'm not sure how I'd connect it to official screen-readers other than by using AccessKit. Which would be good, but also I'm not sure if there's a newer solution I'm not finding in my research. As well... there is a feeling in me that if this is something I need, I will simply keep building out that infrastructure for myself. As a designer of software, and a disabled person, I can find a way that makes the application more or entirely usable without vision. However... at some point it feels like a job for a tmux-like shell multiplexer, or perhaps a job for a shell itself. And while that may be true, I'm also not opposed to creating bespoke solutions for specific tools that I use.

I don't know if any of that would work in a TTY either, I mostly use alacritty on wayland.

Or did you end up finding another solution already?

edit: After thinking through this, all that I want on the Helix side is to "hook" into live knowledge about what buffer is open, where the cursor is, and what is on screen. in a sense, there's a layout language that could be devised via a multiplexing tui spy process which could potentially read the screen, but I wonder if there's just a way for helix to say "here I am in the files, here's what's on screen pop-up wise, here are the possible hotkeys that are active at the moment, here are the keys that have been pressed. have at it, also it's toml"

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Screen reader support #12508

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Screen reader support #12508

Uh oh!

jquesada2016 Jan 12, 2025

AccessKit

Built-in

Comparisons

Replies: 4 comments

Uh oh!

kirawi Jan 12, 2025 Collaborator

Uh oh!

jquesada2016 Jan 12, 2025 Author

Uh oh!

Uh oh!

kiki-ala Sep 3, 2025

Uh oh!

Uh oh!

kiki-ala Sep 3, 2025

jquesada2016
Jan 12, 2025

kirawi
Jan 12, 2025
Collaborator

jquesada2016
Jan 12, 2025
Author

kiki-ala
Sep 3, 2025

kiki-ala
Sep 3, 2025