Does Spacy's Readability Calcs Work for Non-English European Languages? #9805
-
Hi, Does anyone have experience using Spacy's Readability Calcs (https://spacy.io/universe/project/spacy_readability) for French, German, Italian, Portuguese, Russian, and Spanish? If yes, does it work as well as it does for English text? If no, then any suggestions for how to improve it or other solutions that result in a similar text complexity prediction for these languages? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
For third-party projects it's generally better to try asking directly at the project repo (in this case here) first. In this case the package seems to be abandoned though. The readability metrics included in that package are all rather old methods (the newest one is from 1969) that were designed specifically for English, so you should probably do something else for other languages. |
Beta Was this translation helpful? Give feedback.
-
Hellos, I quickly checked this package for you. It looks like it scores some English words according to some readability criteria, there's a word list: https://github.com/mholtzscher/spacy_readability/blob/master/spacy_readability/words.py Also calculation include syllable count https://github.com/mholtzscher/spacy_readability/blob/18ff66ae78299306733d987e509aaa0f775779b5/spacy_readability/__init__.py#L190 with a Python package So, calculations are quite English language specific, I see that this project works only on English text. However, if you have word counts and a syllable counter for other languages, I think a generic solution is not so difficult to implement. Cheers! |
Beta Was this translation helpful? Give feedback.
For third-party projects it's generally better to try asking directly at the project repo (in this case here) first. In this case the package seems to be abandoned though.
The readability metrics included in that package are all rather old methods (the newest one is from 1969) that were designed specifically for English, so you should probably do something else for other languages.