-
-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Feature Category
Neural Prediction (AI models, accuracy, speed)
Feature Description
If autocorrect is powered by a neural network, what if we could deliberately train it in a "safe space" rather than only while we're actually trying to use it? Another FOSS keyboard project by FUTO tried this, though theirs was a web page that collected training data. We could make use of the same idea here but built into the app!
Essentially the idea is to generate network training data by having the user read text and retype it, "show me what the gestures for these words look like and what words are more common."
Problem This Solves
- Autocorrect always kinda sucks at first, no matter who developed it, even Google's keyboard.
- Breaking autocorrect in faster helps retain users. (Personal anecdote: I tried the app, couldn't daily-drive it because another one was already trained, forgot about it, keep finding it and saying I want to try again. It's cool, it's just the autocorrect.)
- Better training data: you're not guessing anything, you have confidence you know what they meant to type.
- Also offers a little activity that's fun but productive: improve your keyboard experience while reading some articles
Proposed Solution
- Copy source text from a URL or a given file or copy/paste. (Maybe suggest Wikipedia's random article button or Project Gutenberg books)
- Show the source text as written, dimmed. Ask the user to type it verbatim. As they type, color the active word brightly and the past text slightly less
- Tweak backspace to do whole word deletion for now
- Log every detail. Every typo. Every gesture. Every bad attempt and backspace and redo.
- Know that this data is valuable for ALL kinds of mistakes. Physical typos like mashing the wrong key, failing at spelling, and thinking of the wrong word. In all those cases, you can actually see what they meant.
- Even if they read the word wrong, it's still good training data. Take etymology/entomology mixup: you still see the words around it that provide context clues, the word we meant, and the word we said by mistake.
- Machine learning...?
- profit
Alternative Solutions
- Standalone app, crowdsource data for the starter model
- Rather than tweak autocorrect, tweak the config file numbers until simulating the same log gets the best autocorrect results.
Priority Level
Nice to have (would be cool but not essential)
Implementation Considerations
- This might require neural model changes
- This might need new UI components
- This could impact performance
- This might affect privacy/security
- This could require system-level permissions
- This might need hardware acceleration
- I'm willing to help implement this
Neural/AI Considerations (if applicable)
This would provide more data for the models. The production side would probably not need changed, but maybe the training side would.
Privacy Considerations
- Assuming you don't crowdsource the data for the starter model, data shouldn't need to leave the phone... long as you train when plugged into a charger.
- Assuming you do crowdsource, it should be made clear up front. At least you're training on a public domain book instead of someone's chat logs
Additional Context
Perhaps check out Futo's implementation, or request their dataset if you haven't. https://swipe.futo.org/