Skip to content

Conversation

@subha0319
Copy link

Resolves #2124.

Description

This PR reduces "cross-language noise" for multilingual users by implementing heuristics to filter out irrelevant suggestions from inactive languages.

Implementation

Modified getSuggestionResults in DictionaryFacilitatorImpl.kt with two filtering strategies:

  1. Context Strategy (Primary): If the previous word exists in Language A but not Language B, suggestions are restricted to Language A. (Supports glide typing).
  2. Exact Match Strategy (Fallback): If the current word is an exact match in Language A but not Language B, suggestions are restricted to Language A.

Safety: Filtering is disabled if a word exists in multiple dictionaries (overlap) or is unknown, ensuring valid suggestions are not lost in ambiguous cases.

Validation

  • Manual Testing: Verified by the issue reporter (@emavgl). Confirmed it resolves the reported noise issue.
  • Regression Checks: Verified that missing suggestions in specific apps (e.g., Firefox) are existing upstream behavior and not a regression.
  • Unit Tests: Validated logic locally for standard flows and edge cases.

@subha0319 subha0319 changed the title feat: add language auto-detection heuristics to suggestion logic feat: add Language Auto-Detection and Contextual Suggestions Nov 29, 2025
@Helium314
Copy link
Owner

This addresses what the user wanted, but also imposes moch more severe restrictions.
Using only languages in which the previous word is in the main dictionary completely ignores others like the user's personal dictionary. Checking whether the word is valid means that it will never find words the user types frequently, but that are not in any dictionary but user history.

The fallback strategy is less restrictive, but may often lock to a wrong language in case of typos.

I did not yet test it, but I assume both approaches will struggle when mixing languages (e.g. mix an Italian-only word in an English sentence).

@emavgl as far as I understand you only / mostly wanted to avoid auto-correct mixing languages, right?
At least I can't see the point of multilingual typing when we kick out entire languages rather easily...

For this case it seems more suitable to address it in Suggest, where it's decided whether words will be autocorrected.
It would be possible to e.g. only autocorrect to a word that is in the current language of the dictionary facilitator. Playing with the weight / confidence for DictionaryGroups should also have an effect, as autocorrect is, at least partially, score based.

@emavgl
Copy link

emavgl commented Dec 2, 2025

@emavgl as far as I understand you only / mostly wanted to avoid auto-correct mixing languages, right?
At least I can't see the point of multilingual typing when we kick out entire languages rather easily...

I write in 3 languages and depending on the person I speak with, I adapt the language I use (for example, with my Family I would type in Italian, but with my friends I would type in English instead). It never happens or very rarely that I actually want to mix the dictionaries and I want suggestions in the other language. At the same type, it would be just convenient to do not change the language from the keyboard manually every time.

I have re-installed Swift Keyboard to see how they do it. From a user-perspective, when you type a word in the one language, the suggested words are in the same language. I saw the suggestions mixed just one time when typing "I" (shared between english and italian) and when typing the other language word, the dictionary of that language is used.

I believe they probably don't have any heuristic like the one implemented in this PR, disabling the language if the word is a complete match in a dictionary, but the good suggestions they have are just purely based on a big n-gram model, and the model predicts very good "given one word", which would be the next "word". So good, you can just write meaningful sentences just to clicking the suggested word in the middle.

I believe Heliboard does not have such n-gram models, right? I wouldn't play with the scores, penalizing manually, but I would rather focus to implement this n-gram model based prediction, which will result in better suggestions also when typing in a single language.

@subha0319 subha0319 marked this pull request as draft December 2, 2025 12:31
@Helium314
Copy link
Owner

I believe Heliboard does not have such n-gram models, right?

The native library does actually use n-grams, but I never looked how much they are actually used. It might be that only the previous word is used (i.e. bigrams). But suggestions are created separately for each enabled language, as the library cannot combine multiple dictionaries this way. A weight is provided that depends on the language of the previously typed words. I didn't check in detail, but I assume this weight is just a multiplier for the score of a suggestion.
Tuning the weights might be able to achieve what you are looking for. (Except that it will not change when moving to a different input field, e.g. chatting with a different person)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Language Auto-Detection and Contextual Suggestions

3 participants