Skip to content

Begin automatic language detection support#22

Draft
C-Loftus wants to merge 6 commits intoodilia-app:mainfrom
C-Loftus:lingua
Draft

Begin automatic language detection support#22
C-Loftus wants to merge 6 commits intoodilia-app:mainfrom
C-Loftus:lingua

Conversation

@C-Loftus
Copy link
Collaborator

@C-Loftus C-Loftus commented Mar 21, 2025

  • Create a with_language_detection method on the fifo builder to greedily initialize the language detection models
  • Create a send_lines_multilingual method (still playing with names for this) to automatically send lines while also setting the proper language.
  • Put all language detection features under a feature flag
  • Add tests
  • Fix clippy and any formatting issues before review

@C-Loftus

This comment was marked as outdated.

@C-Loftus

This comment was marked as outdated.

@C-Loftus
Copy link
Collaborator Author

Fixed the example; thanks tait for your guidance on that. I am finding that I can detect between obvious differences like Russian vs English no issue but yeah as you mentioned on matrix, English vs Spanish does not work well. It appears that lingua needs either obvious differences like writing systems / diacritic characters (i.e. à ) or a larger context window.

I asked a question on the lingua repo here and we can see what they say. Worst case scenario, even if we implement just distinguishing between Hindu / English / Mandarin / Russian / Korean or other languages where the writing systems are distinct, that is still a win in my opinion, even though its not ideal.

pemistahl/lingua-rs#463

Will fix up clippy stuff and tidy stuff up once I (hopefully) hear back on that discussion and have a better sense of any potential optimization strategies.

@TTWNO
Copy link
Member

TTWNO commented Mar 28, 2025

Subscribed to the issue. We'll see what they say.

I've had difficulty with Lingua switching languages. I tried "Hello world and 你好,世界!" in one string and Lingua always marked the entire text as English (or Chinese if I swapped the order).

The extremely short context windows will hurt for sure, but I have heard that iOS isn't perfect either and you often have to swirch explicitly unless it's a unique writing system.

(Another thing I've been thinking about is how to support Spiel. If I get my crate working, maybe I can copy your work here and apply it there :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants