Inquiry on how to go about improving Japanese parsing quality #6749
Unanswered
nlovell1
asked this question in
Help: Coding & Implementations
Replies: 1 comment 40 replies
-
Given what you want to do, I would recommend running JMDict entries through spaCy and using them to generate Matcher rules. Then you could have a "JMDict entry" entity and extract that.
You seem to be suggesting you want to change the output of the tokenizer and POS tagger, but that doesn't really make sense. What the Entity Matcher will do is give you Spans, which are lists of tokens in the sentence, which you can turn into single strings if you need to. |
Beta Was this translation helpful? Give feedback.
40 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! I'm a newbie to NLP and the whole world of language as it relates to technology, so there are probably a lot of gaps in my knowledge. But, here is my question.
Essentially, I am an avid user of Morphman , which has recently been experimenting with using Spacy's models to parse sentences so that language sentence flashcards can be reordered such that there is only one new vocabulary or grammar structure used.
In the example of 気になった、 for example, it seems that it is parsed [('気', 'NOUN'), ('に', 'ADP'), ('なっ', 'VERB'), ('た', 'AUX'), ('。', 'PUNCT')]. While the model does extremely well catching exactly how the morphemes relate to one another, this type of breakdown is not helpful for the learner. With consideration to JMDict (primary dictionary for english-Japanese), I imagine it would be parsed as something like 気になっ (VERB) た (AUX). I know I can make exceptions in the parsing, but I am wondering if there is a way to brute force the entries of JMDict to combine with the great parsing going on here to output more semantically descriptive results.
Additionally, has there been any interest to train a model that works well for spoken language yet?
Thanks a ton
Beta Was this translation helpful? Give feedback.
All reactions