Skip to content

Conversation

@kurtisc
Copy link

@kurtisc kurtisc commented Apr 21, 2020

Hi!

Vietnamese doesn't separate words with spaces like most other languages that use the Latin alphabet[1], so the current spaces morphemizer is unsuitable.

[1] Fun read https://www.tandfonline.com/doi/pdf/10.1080/00437956.1963.11659787

I wasn't able to find a small library that would do word segmentation for Vietnamese like Jieba does for Chinese. To bundle pyvi in-code like Jieba has been bundled would require bundling many larger dependencies (e.g. Numpy).

So, if merged like this, it's unfortunately a burden on the end user to get the Vietnamese support working. On the other hand, if they don't want it, it won't appear or impact their usage.

If this gets included I'll look into packaging pyvi and it's dependencies as a separate addon like has been done for Mecab, licences permitting. That would make the installation more straight-forward and avoid forcing use of the source version of Anki.

@kurtisc
Copy link
Author

kurtisc commented Aug 15, 2020

Rebased on master and confirmed working when #125 is merged.

With regards to #145: I do have a test for this morphemizer, so hopefully that fulfils @shanrauf's comment.

@ianki
Copy link
Collaborator

ianki commented Nov 9, 2020

Would you mind rebasing again, so I can see if the tests pass? I'll submit after.

@ghost
Copy link

ghost commented Nov 13, 2020

I am really interested in this

@sedosido
Copy link

I haven’t been able to build anki from scratch to import pyvi (I think because my hardware is a little old). Is there any other way I can get vietnamese parsing to work with morphman?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants