Morphologizer plus translation in context / word alignment #11603

swetepete · 2022-10-10T11:14:10Z

swetepete
Oct 10, 2022

I would like to develop a tool which can help you understand a specific word of a foreign language that occurs in a sentence.

Instead of machine translating the whole sentence, you should be able to get the correct translation of just the word you select.

Unlike a dictionary, it doesn’t give you multiple suggestions to choose from. It translates that word in context and suggests either the best translation in your language, or morphologizes the word by giving the base form plus morphological information, and again translated the stem/lemma in context to provide one single best translation.

Basically, it does appear that Spacy already has a morphologizer and a lemmatiser in its pipeline, so I guess that part’s easy.

It doesn’t seem like machine translation APIs like DeepL offer an alignment feature where you can try to show which words or phrases in the source correspond to which words or phrases in the target.

Whereas Spacy doesn’t seem to have any NLP features related to translation.

The only strategy I can think of is I should machine translate the whole sentence and try to import an alignment library.

Here’s a paper on a word-alignment technique: https://aclanthology.org/2021.nodalida-main.7.pdf

This is a cool morphological annotator but not bilingual: https://github.com/google-research/turkish-morphology

Basically my question is if someone can suggest a “Spacy”-est way to do this. Ideally there could be one single pipeline component which integrates lemmatization plus morphological tagging, then context-aware lemma translation, then context-aware complete word translation (and then finally entire sentence machine translation).

At minimum, is it possible the word alignment technique in the paper above could become a Spacy pipeline?

Thanks very much

richardpaulhudson · 2022-10-13T07:25:41Z

richardpaulhudson
Oct 13, 2022

spaCy itself has quite a strictly defined scope which doesn't include parallel documents or machine translation. However, if somebody built a working tool to do this using spaCy pipeline outputs as a starting point, we'd be happy to include it in the spaCy Universe.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Morphologizer plus translation in context / word alignment #11603

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Morphologizer plus translation in context / word alignment #11603

Uh oh!

Uh oh!

swetepete Oct 10, 2022

Replies: 1 comment

Uh oh!

richardpaulhudson Oct 13, 2022

swetepete
Oct 10, 2022

richardpaulhudson
Oct 13, 2022