Morphologizer plus translation in context / word alignment #11603
Unanswered
swetepete
asked this question in
Help: Other Questions
Replies: 1 comment
-
spaCy itself has quite a strictly defined scope which doesn't include parallel documents or machine translation. However, if somebody built a working tool to do this using spaCy pipeline outputs as a starting point, we'd be happy to include it in the spaCy Universe. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I would like to develop a tool which can help you understand a specific word of a foreign language that occurs in a sentence.
Instead of machine translating the whole sentence, you should be able to get the correct translation of just the word you select.
Unlike a dictionary, it doesn’t give you multiple suggestions to choose from. It translates that word in context and suggests either the best translation in your language, or morphologizes the word by giving the base form plus morphological information, and again translated the stem/lemma in context to provide one single best translation.
Basically, it does appear that Spacy already has a morphologizer and a lemmatiser in its pipeline, so I guess that part’s easy.
It doesn’t seem like machine translation APIs like DeepL offer an alignment feature where you can try to show which words or phrases in the source correspond to which words or phrases in the target.
Whereas Spacy doesn’t seem to have any NLP features related to translation.
The only strategy I can think of is I should machine translate the whole sentence and try to import an alignment library.
Here’s a paper on a word-alignment technique: https://aclanthology.org/2021.nodalida-main.7.pdf
This is a cool morphological annotator but not bilingual: https://github.com/google-research/turkish-morphology
Basically my question is if someone can suggest a “Spacy”-est way to do this. Ideally there could be one single pipeline component which integrates lemmatization plus morphological tagging, then context-aware lemma translation, then context-aware complete word translation (and then finally entire sentence machine translation).
At minimum, is it possible the word alignment technique in the paper above could become a Spacy pipeline?
Thanks very much
Beta Was this translation helpful? Give feedback.
All reactions