feat: add Language Auto-Detection and Contextual Suggestions #2149

subha0319 · 2025-11-29T10:33:07Z

Resolves #2124.

Description

This PR reduces "cross-language noise" for multilingual users by implementing heuristics to filter out irrelevant suggestions from inactive languages.

Implementation

Modified getSuggestionResults in DictionaryFacilitatorImpl.kt with two filtering strategies:

Context Strategy (Primary): If the previous word exists in Language A but not Language B, suggestions are restricted to Language A. (Supports glide typing).
Exact Match Strategy (Fallback): If the current word is an exact match in Language A but not Language B, suggestions are restricted to Language A.

Safety: Filtering is disabled if a word exists in multiple dictionaries (overlap) or is unknown, ensuring valid suggestions are not lost in ambiguous cases.

Validation

Manual Testing: Verified by the issue reporter (@emavgl). Confirmed it resolves the reported noise issue.
Regression Checks: Verified that missing suggestions in specific apps (e.g., Firefox) are existing upstream behavior and not a regression.
Unit Tests: Validated logic locally for standard flows and edge cases.

Helium314 · 2025-12-02T04:49:41Z

This addresses what the user wanted, but also imposes moch more severe restrictions.
Using only languages in which the previous word is in the main dictionary completely ignores others like the user's personal dictionary. Checking whether the word is valid means that it will never find words the user types frequently, but that are not in any dictionary but user history.

The fallback strategy is less restrictive, but may often lock to a wrong language in case of typos.

I did not yet test it, but I assume both approaches will struggle when mixing languages (e.g. mix an Italian-only word in an English sentence).

@emavgl as far as I understand you only / mostly wanted to avoid auto-correct mixing languages, right?
At least I can't see the point of multilingual typing when we kick out entire languages rather easily...

For this case it seems more suitable to address it in Suggest, where it's decided whether words will be autocorrected.
It would be possible to e.g. only autocorrect to a word that is in the current language of the dictionary facilitator. Playing with the weight / confidence for DictionaryGroups should also have an effect, as autocorrect is, at least partially, score based.

emavgl · 2025-12-02T06:08:16Z

@emavgl as far as I understand you only / mostly wanted to avoid auto-correct mixing languages, right?
At least I can't see the point of multilingual typing when we kick out entire languages rather easily...

I write in 3 languages and depending on the person I speak with, I adapt the language I use (for example, with my Family I would type in Italian, but with my friends I would type in English instead). It never happens or very rarely that I actually want to mix the dictionaries and I want suggestions in the other language. At the same type, it would be just convenient to do not change the language from the keyboard manually every time.

I have re-installed Swift Keyboard to see how they do it. From a user-perspective, when you type a word in the one language, the suggested words are in the same language. I saw the suggestions mixed just one time when typing "I" (shared between english and italian) and when typing the other language word, the dictionary of that language is used.

I believe they probably don't have any heuristic like the one implemented in this PR, disabling the language if the word is a complete match in a dictionary, but the good suggestions they have are just purely based on a big n-gram model, and the model predicts very good "given one word", which would be the next "word". So good, you can just write meaningful sentences just to clicking the suggested word in the middle.

I believe Heliboard does not have such n-gram models, right? I wouldn't play with the scores, penalizing manually, but I would rather focus to implement this n-gram model based prediction, which will result in better suggestions also when typing in a single language.

Helium314 · 2026-01-06T19:53:44Z

I believe Heliboard does not have such n-gram models, right?

The native library does actually use n-grams, but I never looked how much they are actually used. It might be that only the previous word is used (i.e. bigrams). But suggestions are created separately for each enabled language, as the library cannot combine multiple dictionaries this way. A weight is provided that depends on the language of the previously typed words. I didn't check in detail, but I assume this weight is just a multiplier for the score of a suggestion.
Tuning the weights might be able to achieve what you are looking for. (Except that it will not change when moving to a different input field, e.g. chatting with a different person)

subha0319 and others added 5 commits November 21, 2025 22:47

feat: add language auto-detection logic to filter irrelevant suggestions

0ff56ad

feat: add language auto-detection logic to filter irrelevant suggestions

87c2a90

Merge branch 'main' of github.com:subha0319/HeliBoard

eda9fbf

feat: improve language auto-detection to reduce noise

4c9acc5

Merge branch 'Helium314:main' into main

ce06ec0

subha0319 changed the title ~~feat: add language auto-detection heuristics to suggestion logic~~ feat: add Language Auto-Detection and Contextual Suggestions Nov 29, 2025

subha0319 marked this pull request as draft December 2, 2025 12:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add Language Auto-Detection and Contextual Suggestions #2149

feat: add Language Auto-Detection and Contextual Suggestions #2149

Uh oh!

subha0319 commented Nov 29, 2025

Uh oh!

Helium314 commented Dec 2, 2025

Uh oh!

emavgl commented Dec 2, 2025

Uh oh!

Helium314 commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

feat: add Language Auto-Detection and Contextual Suggestions #2149

Are you sure you want to change the base?

feat: add Language Auto-Detection and Contextual Suggestions #2149

Uh oh!

Conversation

subha0319 commented Nov 29, 2025

Description

Implementation

Validation

Uh oh!

Helium314 commented Dec 2, 2025

Uh oh!

emavgl commented Dec 2, 2025

Uh oh!

Helium314 commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants