Skip to content

[Feature]: Autocorrect / gesture recognition training gameΒ #101

@BurntVoxel

Description

@BurntVoxel

Feature Category

Neural Prediction (AI models, accuracy, speed)

Feature Description

If autocorrect is powered by a neural network, what if we could deliberately train it in a "safe space" rather than only while we're actually trying to use it? Another FOSS keyboard project by FUTO tried this, though theirs was a web page that collected training data. We could make use of the same idea here but built into the app!

Essentially the idea is to generate network training data by having the user read text and retype it, "show me what the gestures for these words look like and what words are more common."

Problem This Solves

  • Autocorrect always kinda sucks at first, no matter who developed it, even Google's keyboard.
  • Breaking autocorrect in faster helps retain users. (Personal anecdote: I tried the app, couldn't daily-drive it because another one was already trained, forgot about it, keep finding it and saying I want to try again. It's cool, it's just the autocorrect.)
  • Better training data: you're not guessing anything, you have confidence you know what they meant to type.
  • Also offers a little activity that's fun but productive: improve your keyboard experience while reading some articles

Proposed Solution

  • Copy source text from a URL or a given file or copy/paste. (Maybe suggest Wikipedia's random article button or Project Gutenberg books)
  • Show the source text as written, dimmed. Ask the user to type it verbatim. As they type, color the active word brightly and the past text slightly less
  • Tweak backspace to do whole word deletion for now
  • Log every detail. Every typo. Every gesture. Every bad attempt and backspace and redo.
  • Know that this data is valuable for ALL kinds of mistakes. Physical typos like mashing the wrong key, failing at spelling, and thinking of the wrong word. In all those cases, you can actually see what they meant.
  • Even if they read the word wrong, it's still good training data. Take etymology/entomology mixup: you still see the words around it that provide context clues, the word we meant, and the word we said by mistake.
  • Machine learning...?
  • profit

Alternative Solutions

  • Standalone app, crowdsource data for the starter model
  • Rather than tweak autocorrect, tweak the config file numbers until simulating the same log gets the best autocorrect results.

Priority Level

Nice to have (would be cool but not essential)

Implementation Considerations

  • This might require neural model changes
  • This might need new UI components
  • This could impact performance
  • This might affect privacy/security
  • This could require system-level permissions
  • This might need hardware acceleration
  • I'm willing to help implement this

Neural/AI Considerations (if applicable)

This would provide more data for the models. The production side would probably not need changed, but maybe the training side would.

Privacy Considerations

  • Assuming you don't crowdsource the data for the starter model, data shouldn't need to leave the phone... long as you train when plugged into a charger.
  • Assuming you do crowdsource, it should be made clear up front. At least you're training on a public domain book instead of someone's chat logs

Additional Context

Perhaps check out Futo's implementation, or request their dataset if you haven't. https://swipe.futo.org/

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions