What languages does spacy support? #12205
Replies: 1 comment
-
spaCy supports all the languages listed here. We differentiate between full support, which includes trained pipelines with taggers, NER, and parsers, and alpha support, which just includes a tokenizer. You can see #3056 for an outline of the process of adding a new language, though in practice things are a little simpler now, and the new language tag is more likely to be a useful reference. Also note that we're preparing spacy v4, so things might change a bit, though the general process will be the same. It sounds like you're new to NLP. It's kind of hard to give succinct answers to your questions, but I would recommend Speech and Language Processing by Jurafsky and Martin as a good introduction, that can help answer questions you have like how different words are handled. The main thing that it seems may not be clear to you is that "understanding language" isn't really a single task, and in practice there are different solutions for different problems. While there is a research trend towards larger and more general models, there are still many unanswered questions and different models in use. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello.
I have found this page https://spacy.io/models and there is menu with languages. Are these languages all supported? Does it mean that when I will install the spacy, it will be in the language or that I will obtain the pipelines for these languages?
I would like to know how it works. How do you learn the machine to understand languge? Do you specify e.g. declension for verb and tenses? Or do you just have a database of exact meaning? What I mean, I would explain on my native language which is Czech.
For noun "stavení" (building), we have all declensions like that: in singular we do not change the suffix. But in plural, we add suffixes depending on case. So for example "With buildings..." we translate as "Se staveními". But with noun hrad (castle) there is a suffix for every declension.
So I would like to know how hard is it to create new language pipeline and if is there any change that you could create one for example for Czech, Slovak...
Also I would like to know, is it possible to extend the vocab so, that spacy could work with old language. For example we have a lot of translations between bibles - in Enghlish there is King James language, and in Czech we also have bible Kralická, which is similar old language. I see that spacy can understand old English of King James Version (I tried here https://demos.explosion.ai), but I would like to know if there are changes to understand old languages like Slovanian, because there are some very old and good (high quality) bibles like the Slovanian here: https://www.bible.com/bible/2415/GEN.1.DAL1584
but even Google Translator cannot translate that! Which is said. Because I wanted to created app which would analyze the stucture of verses (just like here: https://hb.openscriptures.org/) and comparing with the hebrew original would be also interesting for me. I mean hebrew support would be also nice, but maybe far future?
Beta Was this translation helpful? Give feedback.
All reactions